Project Team: AJA Analytics

Client: Budweiser Breweries

Project Description: Beer Opportunity and Data Analysis

Introduction

The AJA Analytics team was contracted by Budweiser Breweries to assist with a preliminary data gathering, analysis and exploration for the company’s planned New Brewery Project. The goal of this initiative is to assess the beer and brewery market data, identify trends and advice on opportunities to monetize the findings.

It is known that there is a growing interest in specialized beer strengths and flavors. These emerging trends, coupled with other economic factors make this initiative a very promising investment. We recommend expanding in under-served markets and embarking on a robust marketing strategy targeting key growth areas, guided by a data driven effort to improve sales and increase revenue.

Datasets

Methodology

To undertake the Exploratory Data Analysis, AJA Analytics chose R software as the preferred statistical software environment to gather, clean, explore and summarize the data sets.

The datasets were received in a csv file format and a preliminary review of the files showed that the data was well organized though incomplete.

First, we used the Breweries data set and developed a summary table listing the number of breweries in each state. As seen in the Table 1.1 Breweries by State, represented as a heatmap, there is a wide range in number of breweries by state. Colorado leads the way with 47 breweries and South Dakota is among the states with the least breweries.

Bright red represents the states with the most breweries, while black represents those states with less breweries as seen in Figure 1.1 below.

Figure 1.1: Number of Breweries by state across the United States

Figure 1.1: Number of Breweries by state across the United States

There are states that do not have many breweries, so our team made the decision to group the states based on their region within the US. Our goal was to use these regions to summarize that demographic. We have seven different regions as seen in Figure 1.2: Northwest, West, Southwest, Midwest, Southeast, Mid-Atlantic, and Northeast.

Figure 1.2: United States by Regions

Figure 1.2: United States by Regions

Data Cleansing and Preparation

Tables 1.1 and 1.2 show the first six observations and last six observations of the merged beer and breweries dataset. Table 1.1 illustrates all of the available data values within our combined dataset.

Table 1.1: Sample data: First 6 Observations ~ Beers with Breweries
Brewery_id Name Beer_ID ABV IBU Style Ounces Brewery.Name City State
1 Get Together 2692 0.045 50 American IPA 16 NorthGate Brewing Minneapolis MN
1 Maggie’s Leap 2691 0.049 26 Milk / Sweet Stout 16 NorthGate Brewing Minneapolis MN
1 Wall’s End 2690 0.048 19 English Brown Ale 16 NorthGate Brewing Minneapolis MN
1 Pumpion 2689 0.060 38 Pumpkin Ale 16 NorthGate Brewing Minneapolis MN
1 Stronghold 2688 0.060 25 American Porter 16 NorthGate Brewing Minneapolis MN
1 Parapet ESB 2687 0.056 47 Extra Special / Strong Bitter (ESB) 16 NorthGate Brewing Minneapolis MN

As seen in Table 1.2, there are “NA” values for the IBU variable. This denotes a missing value. There were a total of 1067 missing values: 62 in Alcohol Content (ABV) and 1005 in Bitterness (IBU). We replaced the missing values with the median (middle value) based on the style of beer. If the median for the beer population was zero (0), then we would use the average. This occurred for some beers that were sweet with low alcohol content.

Once the regions were integrated, and the missing values adjusted, we explored the characteristics of Bitterness and Alcohol Content by regional demographic and state.

Table 1.2: Sample data: Last 6 Observations ~ Beers with Breweries
Brewery_id Name Beer_ID ABV IBU Style Ounces Brewery.Name City State
2405 556 Pilsner Ukiah 98 0.055 NA German Pilsener 12 Ukiah Brewing Company Ukiah CA
2406 557 Heinnieweisse Weissebier 52 0.049 NA Hefeweizen 12 Butternuts Beer and Ale Garrattsville NY
2407 557 Snapperhead IPA 51 0.068 NA American IPA 12 Butternuts Beer and Ale Garrattsville NY
2408 557 Moo Thunder Stout 50 0.049 NA Milk / Sweet Stout 12 Butternuts Beer and Ale Garrattsville NY
2409 557 Porkslap Pale Ale 49 0.043 NA American Pale Ale (APA) 12 Butternuts Beer and Ale Garrattsville NY
2410 558 Urban Wilderness Pale Ale 30 0.049 NA English Pale Ale 12 Sleeping Lady Brewing Company Anchorage AK

Data Characteristics

Bitterness

Evaluating bitterness as measured by the International Bitterness Unit, or IBU, the bar chart displayed in Figure 2.1 depicts the medians of Bitterness across the Region. The gold line over the bars displays the range of the IBU with the bottom being the lowest observation recorded in that region and the top of the gold line representing the largest observation recorded in that region. It is evident the median values are nearly uniform. The Midwest is the lowest median value, therefore they tend to brew less bitter beers. The Northwest has the largest observation recorded for bitterness.

Figure 2.1: Distribution of Medians for International Bitterness Unit (IBU) by Region

Figure 2.1: Distribution of Medians for International Bitterness Unit (IBU) by Region

Figure 2.2 is broken down by State. Wisconsin has the lowest Median IBU at 19, therefore we could infer that based on this sample, they would prefer less bitter beers, whereas Maine has the largest with 61. Even though Maine has the largest Median of Bitterness, it did not contain the most bitter unit, it was Oregon’s Bitter Bitch Imperial IPA with an IBU of 138.

Figure 2.2: Distribution of Medians for International Bitterness Unit (IBU) by State

Figure 2.2: Distribution of Medians for International Bitterness Unit (IBU) by State

Alcohol by Volume

The alcoholic content by region is depicted in Figure 3.1, the Median values are the bars and the range of minimum and maximum observation are represented by the gold line. Each region appears to be uniformly distributed, they have similar median values, the Southwest has the largest observation of alcohol by Volume.

Figure 3.1: Distribution of Medians for Alcohol By Volume (ABV) by Region

Figure 3.1: Distribution of Medians for Alcohol By Volume (ABV) by Region

These characteristics at the State level can be seen in Figure 3.2. Maine and West Virginia have the largest median values meaning they enjoy strong beers whereas Wisconsin and Kansas have the lowest, meaning they do not typically brew stronger beers. This is based on the assumption that the breweries sampled are representative of the population. The largest value recorded was 12.8% alcohol content in the Southwest region from a beer called Lee Hill Series Vol. 5 of Colorado.

Figure 3.2: Distribution of Medians for Alcohol By Volume by State

Figure 3.2: Distribution of Medians for Alcohol By Volume by State

Summary Statistics

The boxplot in Figure 3.3 visualizes the summary statistics for the alcohol content by region. The regions with the smaller boxes mean the alcohol content in that region is less varied, while the regions with a larger box means they have more variation within the alcohol content. The black line represents the median alcohol content for that region. We can see, there are some extremities depicted by the black dots on the graphic. Those observations are beers outside of the most observed.

Figure 3.3:The Distributions of ABV by Regions of the United States

Figure 3.3:The Distributions of ABV by Regions of the United States

Data Relationships

Figure 4.1 shows the relationship of alcohol content and bitterness for all regional data. We can see the data points cluster around 0.05 ABV and a bitterness of 25.

Figure 4.1: ABV vs IBU all Regions

Figure 4.1: ABV vs IBU all Regions

Visualizing the ABV and IBU relationship by regions displays the characteristics with more clarity. The West and Northwest regions shows a strong linear relationship between ABV and IBU, therefore we could assume that as bitterness increases the alcohol content does as well.

Figure 4.2: ABV vs IBU by Region

Figure 4.2: ABV vs IBU by Region

Beer Classifications

Optimizing Prediction Model

Exploring the relationship between the ABV and IBU, we tested whether we can predict if a beer is an IPA or an Ale. We used a k-nn classifier which predicts the style of beer using the closest values to the one we are considering. The value “k” tells us how many values it uses to decide whether it is an IPA or Ale. We tested a range of k nearest neighbors to determine the optimal number of values required to classify the beer as an IPA or Ale most accurately. The most accurate k-nn classifier is 5, as depicted by the line graph illustrated in Figure 5.1.

Figure 5.1: The Mean K-NN Value Line graph

Figure 5.1: The Mean K-NN Value Line graph

Predicting Beer Type by Characteristic

Since the optimal “k” value is equal to 5, we plotted a scatterplot of the output. The scatterplot in Figure 5.2 displays the predicted values of Ale and IPA based on their IBU and ABV. We can see a strong relationship where the Ales typically have less bitterness and less alcohol by volume whereas the IPA has more bitterness and alcohol content. The Ale threshold, as observed in Figure 5.2 occurs when bitterness is less than 50 and when alcohol by volume is below 6.5%.

Figure 5.2: Scatterplot of the K-NN Results

Figure 5.2: Scatterplot of the K-NN Results

Classificaton Model Performance

In Figure 5.3, we see the confusion matrix and model statistics from the predictions using the k-nn classifier with a k value of 5. This tells us that there were 268 Ales and 141 IPAs that were accurately predicted, with 32 Ales and 19 IPAs that were inaccurately predicted. This gives us an 89% accuracy of the predicted values, which is adequate for a classification model.

Confusion Matrix and Statistics

     classifications
      ALE IPA
  ALE 268  32
  IPA  19 141
                                          
               Accuracy : 0.8891          
                 95% CI : (0.8568, 0.9163)
    No Information Rate : 0.6239          
    P-Value [Acc > NIR] : < 2e-16         
                                          
                  Kappa : 0.7602          
                                          
 Mcnemar's Test P-Value : 0.09289         
                                          
            Sensitivity : 0.9338          
            Specificity : 0.8150          
         Pos Pred Value : 0.8933          
         Neg Pred Value : 0.8813          
             Prevalence : 0.6239          
         Detection Rate : 0.5826          
   Detection Prevalence : 0.6522          
      Balanced Accuracy : 0.8744          
                                          
       'Positive' Class : ALE             
                                          

Beers by Category

Findings

Assuming the proportions of beer styles brewed in each region do not represent consumer preference, we would recommend promoting a new line of brews that fit the Belgian or Wild Sour Ale characteristics, since these appear to be underrepresented across beer types.

Figure 5.4: IBU vs. ABV by beer Category

Figure 5.4: IBU vs. ABV by beer Category

Recommendations

Assuming beers brewed by regions closely align to customer preference, we would recommend for Budweiser to expand its offering in the types of beers represented by the largest portions of the pie chart.

They should allocate resources on marketing to increase market share and monetize popularity of specific brews in those regions. For instance, we would recommend for Budweiser to make an IPA specifically targeting the Northwest, Northeast, West and Midwest regions of the United States, because consumers in those regions prefer IPA over all the other styles of beer.

Figure 6.1: Pie Chart of Beers by Category

Figure 6.1: Pie Chart of Beers by Category

Conclusion

More data would need to be investigated before we could make a fully confident recommendation because of the missing data and inability to relate the beers to consumer preference.

If Budweiser’s goal is to improve profits and expand market share, we recommend they release a new product that falls under the Dark Lager type of beer, targeting the Southwest and Midwest regions of the United States since they have larger proportions of this brew in comparison to the other regions.