Project Team: AJA Analytics
Client: Budweiser Breweries
Project Description: Beer Opportunity and Data Analysis
GitHub Repository: https://github.com/SepAlex/AlexSepe-Project-.git
Introduction
The AJA Analytics team was contracted by Budweiser Breweries to assist with a preliminary data gathering, analysis and exploration for the company’s planned New Brewery Project. The goal of this initiative is to assess the beer and brewery market data, identify trends and advice on opportunities to monetize the findings.
It is known that there is a growing interest in specialized beer strengths and flavors. These emerging trends, coupled with other economic factors make this initiative a very promising investment. We recommend expanding in under-served markets and embarking on a robust marketing strategy targeting key growth areas, guided by a data driven effort to improve sales and increase revenue.
Datasets
Beers dataset: The Beers data set contains a list of 2410 beers brands currently being produced in the United States including Alcohol content (ABV), Beer ID, International Bitterness Units (IBU), Style, Ounces and associated Brewery ID.
Breweries dataset: The Breweries data set contains a list of 558 Breweries including Brew_ID, headquarter city and State.
Methodology
To undertake the Exploratory Data Analysis, AJA Analytics chose R software as the preferred statistical software environment to gather, clean, explore and summarize the data sets.
The datasets were received in a csv file format and a preliminary review of the files showed that the data was well organized though incomplete.
First, we used the Breweries data set and developed a summary table listing the number of breweries in each state. As seen in the Table 1.1 Breweries by State, represented as a heatmap, there is a wide range in number of breweries by state. Colorado leads the way with 47 breweries and South Dakota is among the states with the least breweries.
Bright red represents the states with the most breweries, while black represents those states with less breweries as seen in Figure 1.1 below.
Figure 1.1: Number of Breweries by state across the United States
There are states that do not have many breweries, so our team made the decision to group the states based on their region within the US. Our goal was to use these regions to summarize that demographic. We have seven different regions as seen in Figure 1.2: Northwest, West, Southwest, Midwest, Southeast, Mid-Atlantic, and Northeast.
Figure 1.2: United States by Regions
Data Cleansing and Preparation
Tables 1.1 and 1.2 show the first six observations and last six observations of the merged beer and breweries dataset. Table 1.1 illustrates all of the available data values within our combined dataset.
| Brewery_id | Name | Beer_ID | ABV | IBU | Style | Ounces | Brewery.Name | City | State |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Get Together | 2692 | 0.045 | 50 | American IPA | 16 | NorthGate Brewing | Minneapolis | MN |
| 1 | Maggie’s Leap | 2691 | 0.049 | 26 | Milk / Sweet Stout | 16 | NorthGate Brewing | Minneapolis | MN |
| 1 | Wall’s End | 2690 | 0.048 | 19 | English Brown Ale | 16 | NorthGate Brewing | Minneapolis | MN |
| 1 | Pumpion | 2689 | 0.060 | 38 | Pumpkin Ale | 16 | NorthGate Brewing | Minneapolis | MN |
| 1 | Stronghold | 2688 | 0.060 | 25 | American Porter | 16 | NorthGate Brewing | Minneapolis | MN |
| 1 | Parapet ESB | 2687 | 0.056 | 47 | Extra Special / Strong Bitter (ESB) | 16 | NorthGate Brewing | Minneapolis | MN |
As seen in Table 1.2, there are “NA” values for the IBU variable. This denotes a missing value. There were a total of 1067 missing values: 62 in Alcohol Content (ABV) and 1005 in Bitterness (IBU). We replaced the missing values with the median (middle value) based on the style of beer. If the median for the beer population was zero (0), then we would use the average. This occurred for some beers that were sweet with low alcohol content.
Once the regions were integrated, and the missing values adjusted, we explored the characteristics of Bitterness and Alcohol Content by regional demographic and state.
| Brewery_id | Name | Beer_ID | ABV | IBU | Style | Ounces | Brewery.Name | City | State | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2405 | 556 | Pilsner Ukiah | 98 | 0.055 | NA | German Pilsener | 12 | Ukiah Brewing Company | Ukiah | CA |
| 2406 | 557 | Heinnieweisse Weissebier | 52 | 0.049 | NA | Hefeweizen | 12 | Butternuts Beer and Ale | Garrattsville | NY |
| 2407 | 557 | Snapperhead IPA | 51 | 0.068 | NA | American IPA | 12 | Butternuts Beer and Ale | Garrattsville | NY |
| 2408 | 557 | Moo Thunder Stout | 50 | 0.049 | NA | Milk / Sweet Stout | 12 | Butternuts Beer and Ale | Garrattsville | NY |
| 2409 | 557 | Porkslap Pale Ale | 49 | 0.043 | NA | American Pale Ale (APA) | 12 | Butternuts Beer and Ale | Garrattsville | NY |
| 2410 | 558 | Urban Wilderness Pale Ale | 30 | 0.049 | NA | English Pale Ale | 12 | Sleeping Lady Brewing Company | Anchorage | AK |
Data Characteristics
Bitterness
Evaluating bitterness as measured by the International Bitterness Unit, or IBU, the bar chart displayed in Figure 2.1 depicts the medians of Bitterness across the Region. The gold line over the bars displays the range of the IBU with the bottom being the lowest observation recorded in that region and the top of the gold line representing the largest observation recorded in that region. It is evident the median values are nearly uniform. The Midwest is the lowest median value, therefore they tend to brew less bitter beers. The Northwest has the largest observation recorded for bitterness.
Figure 2.1: Distribution of Medians for International Bitterness Unit (IBU) by Region
Figure 2.2 is broken down by State. Wisconsin has the lowest Median IBU at 19, therefore we could infer that based on this sample, they would prefer less bitter beers, whereas Maine has the largest with 61. Even though Maine has the largest Median of Bitterness, it did not contain the most bitter unit, it was Oregon’s Bitter Bitch Imperial IPA with an IBU of 138.
Figure 2.2: Distribution of Medians for International Bitterness Unit (IBU) by State
Alcohol by Volume
The alcoholic content by region is depicted in Figure 3.1, the Median values are the bars and the range of minimum and maximum observation are represented by the gold line. Each region appears to be uniformly distributed, they have similar median values, the Southwest has the largest observation of alcohol by Volume.
Figure 3.1: Distribution of Medians for Alcohol By Volume (ABV) by Region
These characteristics at the State level can be seen in Figure 3.2. Maine and West Virginia have the largest median values meaning they enjoy strong beers whereas Wisconsin and Kansas have the lowest, meaning they do not typically brew stronger beers. This is based on the assumption that the breweries sampled are representative of the population. The largest value recorded was 12.8% alcohol content in the Southwest region from a beer called Lee Hill Series Vol. 5 of Colorado.
Figure 3.2: Distribution of Medians for Alcohol By Volume by State
Summary Statistics
The boxplot in Figure 3.3 visualizes the summary statistics for the alcohol content by region. The regions with the smaller boxes mean the alcohol content in that region is less varied, while the regions with a larger box means they have more variation within the alcohol content. The black line represents the median alcohol content for that region. We can see, there are some extremities depicted by the black dots on the graphic. Those observations are beers outside of the most observed.
Figure 3.3:The Distributions of ABV by Regions of the United States
Data Relationships
Figure 4.1 shows the relationship of alcohol content and bitterness for all regional data. We can see the data points cluster around 0.05 ABV and a bitterness of 25.
Figure 4.1: ABV vs IBU all Regions
Visualizing the ABV and IBU relationship by regions displays the characteristics with more clarity. The West and Northwest regions shows a strong linear relationship between ABV and IBU, therefore we could assume that as bitterness increases the alcohol content does as well.
Figure 4.2: ABV vs IBU by Region
Beer Classifications
Optimizing Prediction Model
Exploring the relationship between the ABV and IBU, we tested whether we can predict if a beer is an IPA or an Ale. We used a k-nn classifier which predicts the style of beer using the closest values to the one we are considering. The value “k” tells us how many values it uses to decide whether it is an IPA or Ale. We tested a range of k nearest neighbors to determine the optimal number of values required to classify the beer as an IPA or Ale most accurately. The most accurate k-nn classifier is 5, as depicted by the line graph illustrated in Figure 5.1.
Figure 5.1: The Mean K-NN Value Line graph
Predicting Beer Type by Characteristic
Since the optimal “k” value is equal to 5, we plotted a scatterplot of the output. The scatterplot in Figure 5.2 displays the predicted values of Ale and IPA based on their IBU and ABV. We can see a strong relationship where the Ales typically have less bitterness and less alcohol by volume whereas the IPA has more bitterness and alcohol content. The Ale threshold, as observed in Figure 5.2 occurs when bitterness is less than 50 and when alcohol by volume is below 6.5%.
Figure 5.2: Scatterplot of the K-NN Results
Classificaton Model Performance
In Figure 5.3, we see the confusion matrix and model statistics from the predictions using the k-nn classifier with a k value of 5. This tells us that there were 268 Ales and 141 IPAs that were accurately predicted, with 32 Ales and 19 IPAs that were inaccurately predicted. This gives us an 89% accuracy of the predicted values, which is adequate for a classification model.
Confusion Matrix and Statistics
classifications
ALE IPA
ALE 268 32
IPA 19 141
Accuracy : 0.8891
95% CI : (0.8568, 0.9163)
No Information Rate : 0.6239
P-Value [Acc > NIR] : < 2e-16
Kappa : 0.7602
Mcnemar's Test P-Value : 0.09289
Sensitivity : 0.9338
Specificity : 0.8150
Pos Pred Value : 0.8933
Neg Pred Value : 0.8813
Prevalence : 0.6239
Detection Rate : 0.5826
Detection Prevalence : 0.6522
Balanced Accuracy : 0.8744
'Positive' Class : ALE
Beers by Category
Findings
Assuming the proportions of beer styles brewed in each region do not represent consumer preference, we would recommend promoting a new line of brews that fit the Belgian or Wild Sour Ale characteristics, since these appear to be underrepresented across beer types.
Figure 5.4: IBU vs. ABV by beer Category
Recommendations
Assuming beers brewed by regions closely align to customer preference, we would recommend for Budweiser to expand its offering in the types of beers represented by the largest portions of the pie chart.
They should allocate resources on marketing to increase market share and monetize popularity of specific brews in those regions. For instance, we would recommend for Budweiser to make an IPA specifically targeting the Northwest, Northeast, West and Midwest regions of the United States, because consumers in those regions prefer IPA over all the other styles of beer.
Figure 6.1: Pie Chart of Beers by Category
Conclusion
More data would need to be investigated before we could make a fully confident recommendation because of the missing data and inability to relate the beers to consumer preference.
If Budweiser’s goal is to improve profits and expand market share, we recommend they release a new product that falls under the Dark Lager type of beer, targeting the Southwest and Midwest regions of the United States since they have larger proportions of this brew in comparison to the other regions.