1. How many breweries are present in each state?
Breweries <- read.csv("~/SMUDataScience/msds6306/Case Study/Breweries.csv")
BreweriesPerState <- summary(Breweries$State)
summary(Breweries$State)
NumberOfBreweriesPerState <- data.frame(BreweriesPerState) NumberOfBreweriesPerState
BreweriesPerState
# A tibble: 50 x 2
State BreweriesPerState
1 AK 7
2 AL 3
3 AR 2
4 AZ 11
5 CA 39
6 CO 47
7 CT 8
8 DE 2
9 FL 15
10 GA 7
Table Of Total Number of Breweries Per State
grid.table(BreweriesPerState)
plot_ly(BreweriesPerState, x = ~State, y = ~BreweriesPerState) %>% add_bars()
Total Number of Breweries per Region
plot_ly(BreweriesPerRegion, x = ~Region, y = ~TotalBreweries) %>% add_bars()
West had the largest number of Breweries with a total of 188, followed by the Midwest with a total of 143.
2. Merge beer data with the breweries data. Print the first 6 observations and the last six observations to check the merged file.
Update col names to match both documents “Brew_id”
names(OnlyStateBreweries) = c("Brewery_id")
Merge data from OnlyStateBreweries & Beers
MegaData <- merge(OnlyStateBreweries, Beers, by = "Brewery_id")
merge(OnlyStateBreweries, Beers, by = "Brewery_id")
Update column names for merged data frame ‘MegaData’
names(MegaData) = c("Brewery_id", "Breweries", "City", "State", "BeerName", "Beer_ID", "ABV", "IBU", "Style", "Ounces")
First 6 observations
head(MegaData)
Brewery_id Breweries City State BeerName
1 1 NorthGate Brewing Minneapolis MN Parapet ESB
2 1 NorthGate Brewing Minneapolis MN Stronghold
3 1 NorthGate Brewing Minneapolis MN Pumpion
4 1 NorthGate Brewing Minneapolis MN Wall's End
5 1 NorthGate Brewing Minneapolis MN Get Together
6 1 NorthGate Brewing Minneapolis MN Maggie's Leap
Beer_ID ABV IBU Style Ounces
1 2687 0.056 47 Extra Special / Strong Bitter (ESB) 16
2 2688 0.060 25 American Porter 16
3 2689 0.060 38 Pumpkin Ale 16
4 2690 0.048 19 English Brown Ale 16
5 2692 0.045 50 American IPA 16
6 2691 0.049 26 Milk / Sweet Stout 16
Last 6 observations
tail(MegaData)
Brewery_id Breweries City State
2397 556 Ukiah Brewing Company Ukiah CA
2398 557 Butternuts Beer and Ale Garrattsville NY
2399 557 Butternuts Beer and Ale Garrattsville NY
2400 557 Butternuts Beer and Ale Garrattsville NY
2401 557 Butternuts Beer and Ale Garrattsville NY
2402 558 Sleeping Lady Brewing Company Anchorage AK
BeerName Beer_ID ABV IBU
2397 Pilsner Ukiah 98 0.055 NA
2398 Porkslap Pale Ale 49 0.043 NA
2399 Snapperhead IPA 51 0.068 NA
2400 Moo Thunder Stout 50 0.049 NA
2401 Heinnieweisse Weissebier 52 0.049 NA
2402 Urban Wilderness Pale Ale 30 0.049 NA
Style Ounces
2397 German Pilsener 12
2398 American Pale Ale (APA) 12
2399 American IPA 12
2400 Milk / Sweet Stout 12
2401 Hefeweizen 12
2402 English Pale Ale 12
3. Report the number of NA’s in each column.
colSums(is.na(MegaData))
Brewery_id Breweries City State BeerName Beer_ID
0 0 0 0 0 0
ABV IBU Style Ounces
62 1001 0 0
NA deleted from within data (ABV & IBU)
MegaData <- MegaData %>% drop_na(ABV)
MegaData <- MegaData %>% drop_na(IBU)
Update data without NA’s
colSums(is.na(MegaData2))
4. Compute the median alcohol content and international bitterness unit for each state. Plot a bar chart to compare.
Median for ABV & IBU
Median_ABV <- data.frame(MegaData %>% group_by(State) %>% summarise(X = median(ABV)))
Median_IBU <- data.frame(MegaData %>% group_by(State) %>% summarise(X = median(IBU)))
Combine ABV & IBU into dataframe
Median_ABV_IBU <- merge(Median_ABV, Median_IBU, by = "State")
Plot Bar Chart
ABV_Percent_Median <- ggplot(MediansDataFrame, aes(State, MedianABV_Percent, label = MedianABV_Percent)) + geom_bar(stat = "identity", color = "steelblue") + xlab("State") + ylab("ABV_Percent_Median")
IBU_Median <- ggplot(MediansDataFrame, aes(State, MedianIBU, label = MedianIBU)) + geom_bar(stat = "identity", color = "green") + xlab("State") + ylab("IBU_Median")
subplot(IBU_Median, ABV_Percent_Median, nrows = 2, margin = 0.05)
The highest median per state for both ABV and IBU was for Maine
5. Which state has the maximum alcoholic (ABV) beer? Which state has the most bitter (IBU) beer?
ABV = Kentucky = .125 (Alcohol Content)
IBU = Oregon = 138 (Bitterness)
6. Summary statistics for the ABV variable.
summary(MegaData2$ABV)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0270 0.0500 0.0570 0.0599 0.0680 0.1250
7. Is there an apparent relationship between the bitterness of the beer and its alcoholic content? Draw a scatter plot.
Correlation
library(psych)
pairs.panels(MegaData2[7:8], gap=0)
Comment: Out of all the numeric variables, ABV and IBU seem to have positive correlation of 0.67. In other words we find that as ABV increases, IBU also increases. However, a correlation of 0.67 does not indicate causation.