1. How many breweries are present in each state?

Breweries <- read.csv("~/SMUDataScience/msds6306/Case Study/Breweries.csv")
BreweriesPerState <- summary(Breweries$State)
summary(Breweries$State)
NumberOfBreweriesPerState <- data.frame(BreweriesPerState) NumberOfBreweriesPerState
BreweriesPerState
# A tibble: 50 x 2
   State BreweriesPerState
 1 AK                    7
 2 AL                    3
 3 AR                    2
 4 AZ                   11
 5 CA                   39
 6 CO                   47
 7 CT                    8
 8 DE                    2
 9 FL                   15
10 GA                    7

Table Of Total Number of Breweries Per State

grid.table(BreweriesPerState)
plot_ly(BreweriesPerState, x = ~State, y = ~BreweriesPerState) %>% add_bars()

Total Number of Breweries per Region

plot_ly(BreweriesPerRegion, x = ~Region, y = ~TotalBreweries) %>% add_bars()

West had the largest number of Breweries with a total of 188, followed by the Midwest with a total of 143.

2. Merge beer data with the breweries data. Print the first 6 observations and the last six observations to check the merged file.

Update col names to match both documents “Brew_id”

names(OnlyStateBreweries) = c("Brewery_id")

Merge data from OnlyStateBreweries & Beers

MegaData <- merge(OnlyStateBreweries, Beers, by = "Brewery_id")
merge(OnlyStateBreweries, Beers, by = "Brewery_id")

Update column names for merged data frame ‘MegaData’

names(MegaData) = c("Brewery_id", "Breweries", "City", "State", "BeerName", "Beer_ID", "ABV", "IBU", "Style", "Ounces")

First 6 observations

head(MegaData)
 Brewery_id          Breweries        City State      BeerName
1          1 NorthGate Brewing  Minneapolis    MN   Parapet ESB
2          1 NorthGate Brewing  Minneapolis    MN    Stronghold
3          1 NorthGate Brewing  Minneapolis    MN       Pumpion
4          1 NorthGate Brewing  Minneapolis    MN    Wall's End
5          1 NorthGate Brewing  Minneapolis    MN  Get Together
6          1 NorthGate Brewing  Minneapolis    MN Maggie's Leap
  Beer_ID   ABV IBU                               Style Ounces
1    2687 0.056  47 Extra Special / Strong Bitter (ESB)     16
2    2688 0.060  25                     American Porter     16
3    2689 0.060  38                         Pumpkin Ale     16
4    2690 0.048  19                   English Brown Ale     16
5    2692 0.045  50                        American IPA     16
6    2691 0.049  26                  Milk / Sweet Stout     16

Last 6 observations

tail(MegaData)
     Brewery_id                     Breweries          City State
2397        556         Ukiah Brewing Company         Ukiah    CA
2398        557       Butternuts Beer and Ale Garrattsville    NY
2399        557       Butternuts Beer and Ale Garrattsville    NY
2400        557       Butternuts Beer and Ale Garrattsville    NY
2401        557       Butternuts Beer and Ale Garrattsville    NY
2402        558 Sleeping Lady Brewing Company     Anchorage    AK
                      BeerName Beer_ID   ABV IBU
2397             Pilsner Ukiah      98 0.055  NA
2398         Porkslap Pale Ale      49 0.043  NA
2399           Snapperhead IPA      51 0.068  NA
2400         Moo Thunder Stout      50 0.049  NA
2401  Heinnieweisse Weissebier      52 0.049  NA
2402 Urban Wilderness Pale Ale      30 0.049  NA
                       Style Ounces
2397         German Pilsener     12
2398 American Pale Ale (APA)     12
2399            American IPA     12
2400      Milk / Sweet Stout     12
2401              Hefeweizen     12
2402        English Pale Ale     12

3. Report the number of NA’s in each column.

colSums(is.na(MegaData))
Brewery_id  Breweries       City      State   BeerName    Beer_ID 
         0          0          0          0          0          0 
       ABV        IBU      Style     Ounces 
        62       1001          0          0 

NA deleted from within data (ABV & IBU)

MegaData <- MegaData %>% drop_na(ABV)
MegaData <- MegaData %>% drop_na(IBU)

Update data without NA’s

colSums(is.na(MegaData2))

4. Compute the median alcohol content and international bitterness unit for each state. Plot a bar chart to compare.

Median for ABV & IBU

Median_ABV <- data.frame(MegaData %>% group_by(State) %>% summarise(X = median(ABV)))
Median_IBU <- data.frame(MegaData %>% group_by(State) %>% summarise(X = median(IBU)))

Combine ABV & IBU into dataframe

Median_ABV_IBU <- merge(Median_ABV, Median_IBU, by = "State")

Plot Bar Chart

ABV_Percent_Median <- ggplot(MediansDataFrame, aes(State, MedianABV_Percent, label = MedianABV_Percent)) + geom_bar(stat = "identity", color = "steelblue") + xlab("State") + ylab("ABV_Percent_Median")       
IBU_Median <- ggplot(MediansDataFrame, aes(State, MedianIBU, label = MedianIBU)) + geom_bar(stat = "identity", color = "green") + xlab("State") + ylab("IBU_Median")    
subplot(IBU_Median, ABV_Percent_Median, nrows = 2, margin = 0.05)    

The highest median per state for both ABV and IBU was for Maine

5. Which state has the maximum alcoholic (ABV) beer? Which state has the most bitter (IBU) beer?

ABV = Kentucky = .125 (Alcohol Content)
IBU = Oregon = 138 (Bitterness)

6. Summary statistics for the ABV variable.

summary(MegaData2$ABV)
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0270  0.0500  0.0570  0.0599  0.0680  0.1250

7. Is there an apparent relationship between the bitterness of the beer and its alcoholic content? Draw a scatter plot.

Correlation

library(psych)
pairs.panels(MegaData2[7:8], gap=0)

Comment: Out of all the numeric variables, ABV and IBU seem to have positive correlation of 0.67. In other words we find that as ABV increases, IBU also increases. However, a correlation of 0.67 does not indicate causation.