For all of the questions below incorporate the necessary R code directy into your answers. You have 100 mins to complete this exam. At the end of the time period, upload your R Markdown file and the corresponding html file to Canvas. Submit your results before 1:45pm.
In this exam, we will study the liquor sales in Iowa. We will use two
datasets, Iowa_Liquor_Stores.csv and
Story_County_Liquor_Sales_New.csv, which are posted on
Canvas. The first dataset records all the active and
inactive liquor stores in Iowa as 2018/10/01. Please see the link https://data.iowa.gov/Economy/Iowa-Liquor-Stores/ykb6-ywnd
for the information of the variables. You should be able to answer all
of the following questions without the help of this website.
The second dataset contains the spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from January 1, 2012 to 2018. The dataset can be used to analyze total spirits sales in Story county of individual products at the store level. Class E liquor license, for grocery stores, liquor stores, convenience stores, etc., allows commercial establishments to sell liquor for off-premises consumption in original unopened containers.
We will first consider the liquor store data
Iowa_Liquor_Stores.csv.
library(tidyverse)
# place the code for your answer here
library(dplyr)
A = read.csv("Iowa_Liquor_Stores.csv", header = TRUE)
str(A)
## 'data.frame': 1849 obs. of 9 variables:
## $ Store : int 2656 2657 2674 2835 2954 3013 3041 3045 3162 3354 ...
## $ Name : chr "Hy-Vee Food Store / Corning" "Hy-Vee Food Store / Bedford" "Hy-Vee Food Store / Lamoni" "CVS Pharmacy #8538 / Cedar Falls" ...
## $ Store.Status : chr "A" "A" "A" "A" ...
## $ Address : chr "300 10th St" "1604 Bent" "720 East Main" "2302 West First St" ...
## $ City : chr "Corning" "Bedford" "Lamoni" "Cedar Falls" ...
## $ State : chr "IA" "IA" "IA" "IA" ...
## $ Zip.Code : chr "51632" "50833" "50140" "50613" ...
## $ Store.Address: chr "300 10th St\nCorning, IA 51632\n(40.991861, -94.731809)" "1604 Bent\nBedford, IA 50833\n(40.676171, -94.725578)" "720 East Main\nLamoni, IA 50140\n(40.623647, -93.924475)" "2302 West First St\nCedar Falls, IA 50613\n(42.539874, -92.472778)" ...
## $ Report.Date : chr "10/01/2018" "10/01/2018" "10/01/2018" "10/01/2018" ...
filter() function in dplyr package to remove
the stores outside Iowa. Answer: There are 6 States.length(unique(A$State))
## [1] 6
#A.outIA = A %>% filter(State != "IA")
#unique(A.outIA$Store)
A1 <-subset(A, State == "IA")
summary(A1$State)
## Length Class Mode
## 1844 character character
#head(A)
A.cityIA = A %>% filter(State == "IA")
length(unique(A.cityIA$City))
## [1] 419
#head(A.cityIA)
Inactive (TRUE or
FALSE) based on Store.Status. This variable equal to
TRUE if Store.Status = "I", otherwise, it is
FALSE. What is the overall percentage of the inactive
liquor store?A = A %>% mutate(Inactive = if_else(A$Store.Status == "I", TRUE, FALSE))
(sum(A$Inactive) / (sum(A$Inactive) + sum(!A$Inactive))) * 100
## [1] 16.76582
#head(A)
group_by() and summarise()
functions in dplyr package to find the number of liquor
stores and the percentage of inactive liquor stores for each city in
Iowa. Answer: There are a total of 1844 stores in cities in IAA.storeIA = A %>% filter(State =="IA")
length(unique(A.storeIA$Store))
## [1] 1844
A.storeIA %>% group_by(City) %>% summarise(sum(Inactive)+sum(!Inactive))
## # A tibble: 419 × 2
## City `sum(Inactive) + sum(!Inactive)`
## <chr> <int>
## 1 Ackley 3
## 2 Adair 2
## 3 Adel 3
## 4 Afton 1
## 5 Akron 2
## 6 Albert City 1
## 7 Albia 2
## 8 Albion 1
## 9 Alburnett 1
## 10 Alden 2
## # … with 409 more rows
A.storeIA %>% group_by(City) %>% summarise((sum(Inactive) / (sum(Inactive) + sum(!Inactive))) * 100)
## # A tibble: 419 × 2
## City `(sum(Inactive)/(sum(Inactive) + sum(!Inactive))) * 100`
## <chr> <dbl>
## 1 Ackley 33.3
## 2 Adair 0
## 3 Adel 0
## 4 Afton 0
## 5 Akron 50
## 6 Albert City 0
## 7 Albia 0
## 8 Albion 0
## 9 Alburnett 0
## 10 Alden 0
## # … with 409 more rows
#head(A)
Arrange the summary data by the number of stores (from the largest to the smallest). Print the summary data for the top 10 cities.
top10 = A.storeIA %>% group_by(City) %>% summarise(n= (sum(Inactive)+sum(!Inactive))) %>% arrange(desc(n))
head(top10, 10)
## # A tibble: 10 × 2
## City n
## <chr> <int>
## 1 Des Moines 108
## 2 Cedar Rapids 92
## 3 Davenport 59
## 4 Waterloo 54
## 5 Council Bluffs 50
## 6 Dubuque 48
## 7 Sioux City 41
## 8 West Des Moines 36
## 9 Iowa City 34
## 10 Ames 30
#head(A)
A2 <-subset(A, Store.Status == "A")
summary(A2$Store.Status)
## Length Class Mode
## 1539 character character
#head(A)
#A.inactive = A %>% filter(Store.Status !="A")
#unique(A.inactive$Store)
group_by() and summarise()
functions to find the number of active liquor stores for each city in
Iowa.A.storeActiveIA = A %>% filter(State =="IA")
A.storeActiveIA %>% group_by(City) %>% summarise(sum(!Inactive))
## # A tibble: 419 × 2
## City `sum(!Inactive)`
## <chr> <int>
## 1 Ackley 2
## 2 Adair 2
## 3 Adel 3
## 4 Afton 1
## 5 Akron 1
## 6 Albert City 1
## 7 Albia 2
## 8 Albion 1
## 9 Alburnett 1
## 10 Alden 2
## # … with 409 more rows
#head(A)
B <- A %>% group_by(City) %>% summarise(n=(sum(!Inactive)))
FourtyAbove <- B %>% filter(n>=40) %>% arrange(desc(n))
head(FourtyAbove)
## # A tibble: 5 × 2
## City n
## <chr> <int>
## 1 Cedar Rapids 78
## 2 Des Moines 78
## 3 Davenport 49
## 4 Council Bluffs 43
## 5 Dubuque 40
_ Make City variable as a factor.
FourtyAbove $ City <- factor(FourtyAbove$City)
Answer: It seems that Cedar Rapids and Des moines have equal number of active storesthen davenport, then council bluff, and then dubuque.
ggplot(FourtyAbove, aes(x = City, y = n )) + geom_bar(stat = "identity")
separate function from the tidyr
package to separate the column Name into two two columns,
separate by ” ” (space character). You may use the codeA.company = A %>% separate(Name, into = c("company", "name2"), sep = " ") %>% select(-name2)
## Warning: Expected 2 pieces. Additional pieces discarded in 1655 rows [1, 2, 3, 4, 5, 7,
## 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, ...].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 27 rows [41, 112, 162,
## 244, 400, 447, 451, 543, 711, 733, 1013, 1075, 1100, 1126, 1165, 1220, 1332,
## 1390, 1439, 1450, ...].
head(A.company)
## Store company Store.Status Address City State Zip.Code
## 1 2656 Hy-Vee A 300 10th St Corning IA 51632
## 2 2657 Hy-Vee A 1604 Bent Bedford IA 50833
## 3 2674 Hy-Vee A 720 East Main Lamoni IA 50140
## 4 2835 CVS A 2302 West First St Cedar Falls IA 50613
## 5 2954 Dahl's I 3425 Ingersoll Des Moines IA 50300
## 6 3013 Keith's A 207 E Locust St Bloomfield IA 52537
## Store.Address
## 1 300 10th St\nCorning, IA 51632\n(40.991861, -94.731809)
## 2 1604 Bent\nBedford, IA 50833\n(40.676171, -94.725578)
## 3 720 East Main\nLamoni, IA 50140\n(40.623647, -93.924475)
## 4 2302 West First St\nCedar Falls, IA 50613\n(42.539874, -92.472778)
## 5 3425 Ingersoll\nDes Moines, IA 50300\n(41.586313, -93.663337)
## 6 207 E Locust St\nBloomfield, IA 52537\n(40.752691, -92.412847)
## Report.Date Inactive
## 1 10/01/2018 FALSE
## 2 10/01/2018 FALSE
## 3 10/01/2018 FALSE
## 4 10/01/2018 FALSE
## 5 10/01/2018 TRUE
## 6 10/01/2018 FALSE
Identify the top four companies with the most number of active stores
in Iowa. Rename all other stores (other than the top four companies) as
“other” for the variable company.
top_four <- A.company %>% group_by(company) %>% summarise(count = n()) %>% top_n(4, count) %>% pull(company)
A.company$company[!A.company$company %in% top_four] <- "other"
head(A.company)
## Store company Store.Status Address City State Zip.Code
## 1 2656 Hy-Vee A 300 10th St Corning IA 51632
## 2 2657 Hy-Vee A 1604 Bent Bedford IA 50833
## 3 2674 Hy-Vee A 720 East Main Lamoni IA 50140
## 4 2835 other A 2302 West First St Cedar Falls IA 50613
## 5 2954 other I 3425 Ingersoll Des Moines IA 50300
## 6 3013 other A 207 E Locust St Bloomfield IA 52537
## Store.Address
## 1 300 10th St\nCorning, IA 51632\n(40.991861, -94.731809)
## 2 1604 Bent\nBedford, IA 50833\n(40.676171, -94.725578)
## 3 720 East Main\nLamoni, IA 50140\n(40.623647, -93.924475)
## 4 2302 West First St\nCedar Falls, IA 50613\n(42.539874, -92.472778)
## 5 3425 Ingersoll\nDes Moines, IA 50300\n(41.586313, -93.663337)
## 6 207 E Locust St\nBloomfield, IA 52537\n(40.752691, -92.412847)
## Report.Date Inactive
## 1 10/01/2018 FALSE
## 2 10/01/2018 FALSE
## 3 10/01/2018 FALSE
## 4 10/01/2018 FALSE
## 5 10/01/2018 TRUE
## 6 10/01/2018 FALSE
store_counts <- A.company %>% group_by(company, City) %>% summarise(count = n())
## `summarise()` has grouped output by 'company'. You can override using the
## `.groups` argument.
top_four <- store_counts %>% group_by(company) %>% summarise(total_count = sum(count)) %>% top_n(4, total_count) %>% pull(company)
store_counts_filtered <- store_counts %>% filter(company %in% top_four)
store_counts_filtered <- store_counts_filtered %>% group_by(City) %>% summarise(total_count = sum(count)) %>% filter(total_count >= 40) %>% pull(City)
data_filtered <- A.company %>% filter(City %in% store_counts_filtered)
data_filtered$company <- factor(data_filtered$company, levels = top_four[order(store_counts %>% filter(company %in% top_four) %>% group_by(company) %>% summarise(total_count = sum(count)) %>% pull(total_count))])
dodge option for the bar chart.ggplot(data_filtered, aes(x = City, y = ..count.., fill = company)) +
geom_bar(position = "dodge", color = "black")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Store.Address.A.location1 = A.company %>% separate(Store.Address, into = c("Address", "Location"), sep = "\n\\(")
A.location2 = A.location1 %>% separate(Location, into = c("Latitude", "Longitude"), sep = ",") %>% select(-Address)
A.location = A.location2 %>% separate(Longitude, into = c("Longitude", "drop"), sep = "\\)") %>% select(-drop)
A.location1 = A.company %>% separate(Store.Address, into = c("Address", "Location"), sep = "\n\\(")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 85 rows [12, 34, 69, 72,
## 86, 145, 151, 186, 198, 209, 247, 255, 343, 377, 379, 392, 409, 435, 464, 465,
## ...].
A.location2 = A.location1 %>% separate(Location, into = c("Latitude", "Longitude"), sep = ",") %>% select(-Address)
A.location = A.location2 %>% separate(Longitude, into = c("Longitude", "drop"), sep = "\\)") %>% select(-drop)
Use ggplot2, draw a scatter plot of the location of all active liquor
stores in Iowa by the lattitude and longtitude. Use different colors for
the variable company. Note that we may need to change the
format of lattitude and longtitude.
#A.location$Latitude <- as.numeric(A.location$Latitude)
#A.location$Longitude <- as.numeric(A.location$Longitude)
ggplot(A.location, aes(x = Longitude, y = Latitude, color = company)) +
geom_point(size=1) + labs(x = "Longitude", y = "Latitude", color = "Company")
A.location$Latitude <- as.numeric(A.location$Latitude)
A.location$Longitude <- as.numeric(A.location$Longitude)
ggplot(A.location, aes(x = Longitude, y = Latitude, color = company)) +
geom_point(size=1) + labs(x = "Longitude", y = "Latitude", color = "Company")
## Warning: Removed 85 rows containing missing values (`geom_point()`).
Now, we use the liquor sale data
Story_County_Liquor_Sales_New.csv. Note that:
Don’t knit your Rmarkdown file often. It will be slow
since this dataset is large. You can run each code chuck or each line
separately once the dataset is read into the environment. You can knit
after everything is ready.
liquor_sales = read.csv("Story_County_Liquor_Sales_New.csv")
str(liquor_sales)
## 'data.frame': 479752 obs. of 19 variables:
## $ Invoice.Item.Number : chr "S29199000001" "S08900500035" "S06459500039" "S04846600029" ...
## $ Date : chr "11/23/2015" "11/13/2012" "07/09/2012" "04/03/2012" ...
## $ Store.Number : int 2501 4509 4105 4509 2500 3943 4102 4105 2501 2500 ...
## $ Store.Name : chr "Hy-vee #2 / Ames" "A J'S LIQUOR II" "Fareway Stores #426 / Nevada" "A J'S LIQUOR II" ...
## $ Address : chr "640 LINCOLNWAY" "2515 CHAMBERLAIN" "1505 SOUTH B AVE" "2515 CHAMBERLAIN" ...
## $ City : chr "AMES" "AMES" "NEVADA" "AMES" ...
## $ Zip.Code : int 50010 50010 50201 50010 50010 50201 50010 50201 50010 50010 ...
## $ County.Number : int 85 85 85 85 85 85 85 85 85 85 ...
## $ County : chr "Story" "Story" "Story" "Story" ...
## $ Category : int NA 1081900 1041100 1032080 1012200 1081600 1011100 1071100 1022100 1012100 ...
## $ Category.Name : chr "" "MISC. AMERICAN CORDIALS & LIQUEURS" "AMERICAN DRY GINS" "IMPORTED VODKA" ...
## $ Vendor.Number : int 255 322 434 35 260 260 260 55 395 55 ...
## $ Vendor.Name : chr "Wilson Daniels Ltd." "Prestige Wine and Spirits Group" "Luxco-St Louis" "Bacardi U.S.A., Inc." ...
## $ Item.Number : int 297 75210 31658 34433 6196 67267 25608 57148 89230 11936 ...
## $ Item.Description : chr "Templeton Rye w/Flask" "Kinky Liqueur" "Paramount Gin" "Grey Goose Vodka" ...
## $ Pack : int 6 6 6 12 6 12 6 6 6 12 ...
## $ Sale..Dollars. : chr "651.36" "180.00" "21.30" "323.52" ...
## $ Volume.Sold..Liters. : chr "18.00" "9.00" "3.50" "9.00" ...
## $ Volume.Sold..Gallons.: num 4.755 2.378 0.925 2.378 0.198 ...
head(liquor_sales,20)
## Invoice.Item.Number Date Store.Number Store.Name
## 1 S29199000001 11/23/2015 2501 Hy-vee #2 / Ames
## 2 S08900500035 11/13/2012 4509 A J'S LIQUOR II
## 3 S06459500039 07/09/2012 4105 Fareway Stores #426 / Nevada
## 4 S04846600029 04/03/2012 4509 A J'S LIQUOR II
## 5 S26899700012 07/22/2015 2500 Hy-Vee Food Store #1 / Ames
## 6 S13555400025 07/24/2013 3943 Good and Quick Co
## 7 S15680400008 11/12/2013 4102 Fareway Stores #386 / Ames
## 8 S12693100008 06/10/2013 4105 Fareway Stores #426 / Nevada
## 9 S20886300293 08/27/2014 2501 Hy-vee #2 / Ames
## 10 S25751000025 05/20/2015 2500 Hy-Vee Food Store #1 / Ames
## 11 S11121300031 03/18/2013 3524 Sam's Club 6568 / Ames
## 12 S23166300009 12/24/2014 4987 Almost Always Open
## 13 S09467400066 12/13/2012 3973 Mmdg Spirits / Ames
## 14 S10667000018 02/18/2013 2500 Hy-Vee Food Store #1 / Ames
## 15 S18341400015 04/09/2014 4105 Fareway Stores #426 / Nevada
## 16 S04368600019 03/05/2012 3524 Sam's Club 6568 / Ames
## 17 S06914500003 08/02/2012 4102 Fareway Stores #386 / Ames
## 18 S28715000049 10/28/2015 4004 Wal-Mart 4256 / Ames
## 19 S21933000129 10/22/2014 2501 Hy-vee #2 / Ames
## 20 S07254600148 08/22/2012 2501 Hy-vee #2 / Ames
## Address City Zip.Code County.Number County Category
## 1 640 LINCOLNWAY AMES 50010 85 Story NA
## 2 2515 CHAMBERLAIN AMES 50010 85 Story 1081900
## 3 1505 SOUTH B AVE NEVADA 50201 85 Story 1041100
## 4 2515 CHAMBERLAIN AMES 50010 85 Story 1032080
## 5 3800 W LINCOLN WAY AMES 50010 85 Story 1012200
## 6 519 LINCOLNWAY NEVADA 50201 85 Story 1081600
## 7 619 BURNETT AVE AMES 50010 85 Story 1011100
## 8 1505 SOUTH B AVE NEVADA 50201 85 Story 1071100
## 9 640 LINCOLNWAY AMES 50010 85 Story 1022100
## 10 3800 W LINCOLN WAY AMES 50010 85 Story 1012100
## 11 305 AIRPORT RD AMES 50010 85 Story 1081600
## 12 419 LINCOLN WAY AMES 50010 85 Story 1031080
## 13 126A WELCH AVE AMES 50014 85 Story 1032200
## 14 3800 W LINCOLN WAY AMES 50010 85 Story 1081390
## 15 1505 SOUTH B AVE NEVADA 50201 85 Story 1032080
## 16 305 AIRPORT RD AMES 50010 85 Story 1011200
## 17 619 BURNETT AVE AMES 50010 85 Story 1062310
## 18 534 S DUFF AVE AMES 50010 85 Story 1062200
## 19 640 LINCOLNWAY AMES 50010 85 Story 1031200
## 20 640 LINCOLNWAY AMES 50010 85 Story 1012100
## Category.Name Vendor.Number
## 1 255
## 2 MISC. AMERICAN CORDIALS & LIQUEURS 322
## 3 AMERICAN DRY GINS 434
## 4 IMPORTED VODKA 35
## 5 SCOTCH WHISKIES 260
## 6 WHISKEY LIQUEUR 260
## 7 BLENDED WHISKIES 260
## 8 AMERICAN COCKTAILS 55
## 9 TEQUILA 395
## 10 CANADIAN WHISKIES 55
## 11 WHISKEY LIQUEUR 259
## 12 VODKA 80 PROOF 205
## 13 IMPORTED VODKA - MISC 260
## 14 IMPORTED SCHNAPPS 421
## 15 IMPORTED VODKA 370
## 16 STRAIGHT BOURBON WHISKIES 421
## 17 SPICED RUM 260
## 18 PUERTO RICO & VIRGIN ISLANDS RUM 260
## 19 VODKA FLAVORED 421
## 20 CANADIAN WHISKIES 65
## Vendor.Name Item.Number
## 1 Wilson Daniels Ltd. 297
## 2 Prestige Wine and Spirits Group 75210
## 3 Luxco-St Louis 31658
## 4 Bacardi U.S.A., Inc. 34433
## 5 Diageo Americas 6196
## 6 Diageo Americas 67267
## 7 Diageo Americas 25608
## 8 Sazerac North America 57148
## 9 Proximo 89230
## 10 Sazerac North America 11936
## 11 Heaven Hill Brands 86796
## 12 E AND J GALLO WINERY 36968
## 13 Diageo Americas 64755
## 14 Sazerac Co., Inc. 69611
## 15 Pernod Ricard USA/Austin Nichols 34006
## 16 Sazerac Co., Inc. 16518
## 17 Diageo Americas 43336
## 18 Diageo Americas 43330
## 19 Sazerac Co., Inc. 41146
## 20 Jim Beam Brands 15244
## Item.Description Pack Sale..Dollars.
## 1 Templeton Rye w/Flask 6 651.36
## 2 Kinky Liqueur 6 180.00
## 3 Paramount Gin 6 21.30
## 4 Grey Goose Vodka 12 323.52
## 5 Talisker Storm 6 51.74
## 6 Yukon Jack Canadian Liqueur 12 35.24
## 7 Seagrams 7 Crown Bl Whiskey 6 107.58
## 8 Chi-Chi's Margarita W/tequila 6 110.88
## 9 1800 Anejo Tequila 6 58.50
## 10 Canadian Ltd Whisky Convenience Pack 12 19.53
## 11 Southern Host Whiskey Liqueur 12 90.60
## 12 New Amsterdam Vodka 24 31.50
## 13 Ciroc Red Berry 12 27.74
## 14 Dr. Mcgillicuddy's Apple Pie Schnapps 12 149.76
## 15 Absolut Swedish Vodka 80 Prf 12 34.48
## 16 Ancient Age Bourbon 6 202.44
## 17 Captain Morgan Original Spiced 12 152.88
## 18 Captain Morgan White 6 162.00
## 19 Sweet Carolina Sweet Tea Flavored Vodka 12 43.64
## 20 Windsor Canadian Supreme Whisky 24 26.04
## Volume.Sold..Liters. Volume.Sold..Gallons.
## 1 18.00 4.7550969
## 2 9.00 2.3775485
## 3 3.50 0.9246022
## 4 9.00 2.3775485
## 5 0.75 0.1981290
## 6 2.00 0.5283441
## 7 10.50 2.7738065
## 8 21.00 5.5476131
## 9 1.50 0.3962581
## 10 2.25 0.5943871
## 11 9.00 2.3775485
## 12 2.25 0.5943871
## 13 0.75 0.1981290
## 14 9.00 2.3775485
## 15 1.50 0.3962581
## 16 21.00 5.5476131
## 17 9.00 2.3775485
## 18 10.50 2.7738065
## 19 3.00 0.7925162
## 20 2.25 0.5943871
Volume.Sold..Liters. and
Sale..Dollars. to numeric variables (Be careful to convert
factor to numeric, first convert factor to character and then character
to numeric.)liquor_sales$Volume.Sold..Liters. <- as.numeric(as.character(liquor_sales$Volume.Sold..Liters.,na.rm = TRUE))
## Warning: NAs introduced by coercion
liquor_sales$Sale..Dollars. <- as.numeric(as.character(liquor_sales$Sale..Dollars.,na.rm = TRUE))
## Warning: NAs introduced by coercion
City. Several of them should be same city but regarded as
different levels in the data set. Fix them.AmesSale <- liquor_sales %>% filter(City %in% c("Ames", "AMES")) %>%
mutate(City = "Ames")
head(AmesSale)
## Invoice.Item.Number Date Store.Number Store.Name
## 1 S29199000001 11/23/2015 2501 Hy-vee #2 / Ames
## 2 S08900500035 11/13/2012 4509 A J'S LIQUOR II
## 3 S04846600029 04/03/2012 4509 A J'S LIQUOR II
## 4 S26899700012 07/22/2015 2500 Hy-Vee Food Store #1 / Ames
## 5 S15680400008 11/12/2013 4102 Fareway Stores #386 / Ames
## 6 S20886300293 08/27/2014 2501 Hy-vee #2 / Ames
## Address City Zip.Code County.Number County Category
## 1 640 LINCOLNWAY Ames 50010 85 Story NA
## 2 2515 CHAMBERLAIN Ames 50010 85 Story 1081900
## 3 2515 CHAMBERLAIN Ames 50010 85 Story 1032080
## 4 3800 W LINCOLN WAY Ames 50010 85 Story 1012200
## 5 619 BURNETT AVE Ames 50010 85 Story 1011100
## 6 640 LINCOLNWAY Ames 50010 85 Story 1022100
## Category.Name Vendor.Number
## 1 255
## 2 MISC. AMERICAN CORDIALS & LIQUEURS 322
## 3 IMPORTED VODKA 35
## 4 SCOTCH WHISKIES 260
## 5 BLENDED WHISKIES 260
## 6 TEQUILA 395
## Vendor.Name Item.Number Item.Description Pack
## 1 Wilson Daniels Ltd. 297 Templeton Rye w/Flask 6
## 2 Prestige Wine and Spirits Group 75210 Kinky Liqueur 6
## 3 Bacardi U.S.A., Inc. 34433 Grey Goose Vodka 12
## 4 Diageo Americas 6196 Talisker Storm 6
## 5 Diageo Americas 25608 Seagrams 7 Crown Bl Whiskey 6
## 6 Proximo 89230 1800 Anejo Tequila 6
## Sale..Dollars. Volume.Sold..Liters. Volume.Sold..Gallons.
## 1 651.36 18.00 4.7550969
## 2 180.00 9.00 2.3775485
## 3 323.52 9.00 2.3775485
## 4 51.74 0.75 0.1981290
## 5 107.58 10.50 2.7738065
## 6 58.50 1.50 0.3962581
Date.AmesSale <- AmesSale %>% separate(Date, into = c("month", "day", "year"), sep = "/")
#mutate(Date = ymd(Date),Year = year(Date), Month = month(Date),Day = day(Date))
#liquor_sales = liquor_sales %>% separate(Date, into = c("month", "day", "year"), sep = "/")
head(AmesSale)
## Invoice.Item.Number month day year Store.Number Store.Name
## 1 S29199000001 11 23 2015 2501 Hy-vee #2 / Ames
## 2 S08900500035 11 13 2012 4509 A J'S LIQUOR II
## 3 S04846600029 04 03 2012 4509 A J'S LIQUOR II
## 4 S26899700012 07 22 2015 2500 Hy-Vee Food Store #1 / Ames
## 5 S15680400008 11 12 2013 4102 Fareway Stores #386 / Ames
## 6 S20886300293 08 27 2014 2501 Hy-vee #2 / Ames
## Address City Zip.Code County.Number County Category
## 1 640 LINCOLNWAY Ames 50010 85 Story NA
## 2 2515 CHAMBERLAIN Ames 50010 85 Story 1081900
## 3 2515 CHAMBERLAIN Ames 50010 85 Story 1032080
## 4 3800 W LINCOLN WAY Ames 50010 85 Story 1012200
## 5 619 BURNETT AVE Ames 50010 85 Story 1011100
## 6 640 LINCOLNWAY Ames 50010 85 Story 1022100
## Category.Name Vendor.Number
## 1 255
## 2 MISC. AMERICAN CORDIALS & LIQUEURS 322
## 3 IMPORTED VODKA 35
## 4 SCOTCH WHISKIES 260
## 5 BLENDED WHISKIES 260
## 6 TEQUILA 395
## Vendor.Name Item.Number Item.Description Pack
## 1 Wilson Daniels Ltd. 297 Templeton Rye w/Flask 6
## 2 Prestige Wine and Spirits Group 75210 Kinky Liqueur 6
## 3 Bacardi U.S.A., Inc. 34433 Grey Goose Vodka 12
## 4 Diageo Americas 6196 Talisker Storm 6
## 5 Diageo Americas 25608 Seagrams 7 Crown Bl Whiskey 6
## 6 Proximo 89230 1800 Anejo Tequila 6
## Sale..Dollars. Volume.Sold..Liters. Volume.Sold..Gallons.
## 1 651.36 18.00 4.7550969
## 2 180.00 9.00 2.3775485
## 3 323.52 9.00 2.3775485
## 4 51.74 0.75 0.1981290
## 5 107.58 10.50 2.7738065
## 6 58.50 1.50 0.3962581
group_by() and summarise()
functions in dplyr package to obtain a summarized data set
of total Volume.Sold..Gallons. and
Sale..Dollars. for different combinations of
month and year. sales_summary <- AmesSale %>%
group_by(year, month) %>%
summarise(Total_Volume_Sold = sum(Volume.Sold..Liters.)*0.264172,
Total_Sales = sum(Sale..Dollars.))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
color option in ggplot2.sales_summary%>% ggplot(aes(x = month, y = Total_Sales, color = as.factor(year))) +
geom_point()
## Warning: Removed 81 rows containing missing values (`geom_point()`).