For all of the questions below incorporate the necessary R code directy into your answers. You have 100 mins to complete this exam. At the end of the time period, upload your R Markdown file and the corresponding html file to Canvas. Submit your results before 1:45pm.

Story County Liquor Sales

In this exam, we will study the liquor sales in Iowa. We will use two datasets, Iowa_Liquor_Stores.csv and Story_County_Liquor_Sales_New.csv, which are posted on Canvas. The first dataset records all the active and inactive liquor stores in Iowa as 2018/10/01. Please see the link https://data.iowa.gov/Economy/Iowa-Liquor-Stores/ykb6-ywnd for the information of the variables. You should be able to answer all of the following questions without the help of this website.

The second dataset contains the spirits purchase information of Iowa Class “E” liquor licensees by product and date of purchase from January 1, 2012 to 2018. The dataset can be used to analyze total spirits sales in Story county of individual products at the store level. Class E liquor license, for grocery stores, liquor stores, convenience stores, etc., allows commercial establishments to sell liquor for off-premises consumption in original unopened containers.

We will first consider the liquor store data Iowa_Liquor_Stores.csv.

Question #1 (25 points)

library(tidyverse)
# place the code for your answer here
library(dplyr)

A = read.csv("Iowa_Liquor_Stores.csv", header = TRUE)
  • How many records are in the data set, how many variables? Answer: There are 1849 records and 9 variables
str(A)
## 'data.frame':    1849 obs. of  9 variables:
##  $ Store        : int  2656 2657 2674 2835 2954 3013 3041 3045 3162 3354 ...
##  $ Name         : chr  "Hy-Vee Food Store / Corning" "Hy-Vee Food Store / Bedford" "Hy-Vee Food Store / Lamoni" "CVS Pharmacy #8538 / Cedar Falls" ...
##  $ Store.Status : chr  "A" "A" "A" "A" ...
##  $ Address      : chr  "300  10th St" "1604 Bent" "720 East Main" "2302 West First St" ...
##  $ City         : chr  "Corning" "Bedford" "Lamoni" "Cedar Falls" ...
##  $ State        : chr  "IA" "IA" "IA" "IA" ...
##  $ Zip.Code     : chr  "51632" "50833" "50140" "50613" ...
##  $ Store.Address: chr  "300 10th St\nCorning, IA 51632\n(40.991861, -94.731809)" "1604 Bent\nBedford, IA 50833\n(40.676171, -94.725578)" "720 East Main\nLamoni, IA 50140\n(40.623647, -93.924475)" "2302 West First St\nCedar Falls, IA 50613\n(42.539874, -92.472778)" ...
##  $ Report.Date  : chr  "10/01/2018" "10/01/2018" "10/01/2018" "10/01/2018" ...
  • How many different states are included in the dataset? Use filter() function in dplyr package to remove the stores outside Iowa. Answer: There are 6 States.
length(unique(A$State))
## [1] 6
#A.outIA = A %>% filter(State != "IA")
#unique(A.outIA$Store)
A1 <-subset(A, State == "IA")
summary(A1$State)
##    Length     Class      Mode 
##      1844 character character
#head(A)
  • How many different cities of Iowa are included in the dataset?
    Answer: There are 419 different cities from IA in Dataset
A.cityIA = A %>% filter(State == "IA")
length(unique(A.cityIA$City))
## [1] 419
#head(A.cityIA)
  • Introduce a new logical variable Inactive (TRUE or FALSE) based on Store.Status. This variable equal to TRUE if Store.Status = "I", otherwise, it is FALSE. What is the overall percentage of the inactive liquor store?
    Answer: the overall percentage of the inactive liquor store is 16.7%
A = A %>% mutate(Inactive = if_else(A$Store.Status == "I", TRUE, FALSE))
(sum(A$Inactive) / (sum(A$Inactive) + sum(!A$Inactive))) * 100
## [1] 16.76582
#head(A)
  • Use the group_by() and summarise() functions in dplyr package to find the number of liquor stores and the percentage of inactive liquor stores for each city in Iowa. Answer: There are a total of 1844 stores in cities in IA
A.storeIA = A %>% filter(State =="IA")
length(unique(A.storeIA$Store))
## [1] 1844
A.storeIA %>% group_by(City) %>% summarise(sum(Inactive)+sum(!Inactive))
## # A tibble: 419 × 2
##    City        `sum(Inactive) + sum(!Inactive)`
##    <chr>                                  <int>
##  1 Ackley                                     3
##  2 Adair                                      2
##  3 Adel                                       3
##  4 Afton                                      1
##  5 Akron                                      2
##  6 Albert City                                1
##  7 Albia                                      2
##  8 Albion                                     1
##  9 Alburnett                                  1
## 10 Alden                                      2
## # … with 409 more rows
A.storeIA %>% group_by(City) %>% summarise((sum(Inactive) / (sum(Inactive) + sum(!Inactive))) * 100)
## # A tibble: 419 × 2
##    City        `(sum(Inactive)/(sum(Inactive) + sum(!Inactive))) * 100`
##    <chr>                                                          <dbl>
##  1 Ackley                                                          33.3
##  2 Adair                                                            0  
##  3 Adel                                                             0  
##  4 Afton                                                            0  
##  5 Akron                                                           50  
##  6 Albert City                                                      0  
##  7 Albia                                                            0  
##  8 Albion                                                           0  
##  9 Alburnett                                                        0  
## 10 Alden                                                            0  
## # … with 409 more rows
#head(A)

Arrange the summary data by the number of stores (from the largest to the smallest). Print the summary data for the top 10 cities.

top10 = A.storeIA %>% group_by(City) %>% summarise(n= (sum(Inactive)+sum(!Inactive))) %>% arrange(desc(n))
head(top10, 10)
## # A tibble: 10 × 2
##    City                n
##    <chr>           <int>
##  1 Des Moines        108
##  2 Cedar Rapids       92
##  3 Davenport          59
##  4 Waterloo           54
##  5 Council Bluffs     50
##  6 Dubuque            48
##  7 Sioux City         41
##  8 West Des Moines    36
##  9 Iowa City          34
## 10 Ames               30
#head(A)

Question #2 (15 points)

  • Remove all the inactive stores.
A2 <-subset(A, Store.Status == "A")
summary(A2$Store.Status)
##    Length     Class      Mode 
##      1539 character character
#head(A)

#A.inactive = A %>% filter(Store.Status !="A")
#unique(A.inactive$Store)
  • Use the group_by() and summarise() functions to find the number of active liquor stores for each city in Iowa.
A.storeActiveIA = A %>% filter(State =="IA")
A.storeActiveIA %>% group_by(City) %>% summarise(sum(!Inactive))
## # A tibble: 419 × 2
##    City        `sum(!Inactive)`
##    <chr>                  <int>
##  1 Ackley                     2
##  2 Adair                      2
##  3 Adel                       3
##  4 Afton                      1
##  5 Akron                      1
##  6 Albert City                1
##  7 Albia                      2
##  8 Albion                     1
##  9 Alburnett                  1
## 10 Alden                      2
## # … with 409 more rows
#head(A)
  • Identify the cities with no less than 40 active liquor stores. Only consider those cities for the following two questions.
B <- A %>% group_by(City) %>% summarise(n=(sum(!Inactive)))  
FourtyAbove <- B %>% filter(n>=40) %>% arrange(desc(n))
head(FourtyAbove)
## # A tibble: 5 × 2
##   City               n
##   <chr>          <int>
## 1 Cedar Rapids      78
## 2 Des Moines        78
## 3 Davenport         49
## 4 Council Bluffs    43
## 5 Dubuque           40

_ Make City variable as a factor.

FourtyAbove $ City <- factor(FourtyAbove$City)
  • Using ggplot2, draw a bar plot that shows the number of the liquor stores for those selected cities with no less than 40 active stores. Make sure the levels of the cities are ordered based on the number of stores.

Answer: It seems that Cedar Rapids and Des moines have equal number of active storesthen davenport, then council bluff, and then dubuque.

ggplot(FourtyAbove, aes(x = City, y = n )) + geom_bar(stat = "identity")

Question #3 (20 points)

  • Use separate function from the tidyr package to separate the column Name into two two columns, separate by ” ” (space character). You may use the code
A.company = A %>% separate(Name, into = c("company", "name2"), sep = " ") %>% select(-name2)
## Warning: Expected 2 pieces. Additional pieces discarded in 1655 rows [1, 2, 3, 4, 5, 7,
## 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, ...].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 27 rows [41, 112, 162,
## 244, 400, 447, 451, 543, 711, 733, 1013, 1075, 1100, 1126, 1165, 1220, 1332,
## 1390, 1439, 1450, ...].
head(A.company)
##   Store company Store.Status            Address        City State Zip.Code
## 1  2656  Hy-Vee            A       300  10th St     Corning    IA    51632
## 2  2657  Hy-Vee            A          1604 Bent     Bedford    IA    50833
## 3  2674  Hy-Vee            A      720 East Main      Lamoni    IA    50140
## 4  2835     CVS            A 2302 West First St Cedar Falls    IA    50613
## 5  2954  Dahl's            I     3425 Ingersoll  Des Moines    IA    50300
## 6  3013 Keith's            A    207 E Locust St  Bloomfield    IA    52537
##                                                        Store.Address
## 1            300 10th St\nCorning, IA 51632\n(40.991861, -94.731809)
## 2              1604 Bent\nBedford, IA 50833\n(40.676171, -94.725578)
## 3           720 East Main\nLamoni, IA 50140\n(40.623647, -93.924475)
## 4 2302 West First St\nCedar Falls, IA 50613\n(42.539874, -92.472778)
## 5      3425 Ingersoll\nDes Moines, IA 50300\n(41.586313, -93.663337)
## 6     207 E Locust St\nBloomfield, IA 52537\n(40.752691, -92.412847)
##   Report.Date Inactive
## 1  10/01/2018    FALSE
## 2  10/01/2018    FALSE
## 3  10/01/2018    FALSE
## 4  10/01/2018    FALSE
## 5  10/01/2018     TRUE
## 6  10/01/2018    FALSE

Identify the top four companies with the most number of active stores in Iowa. Rename all other stores (other than the top four companies) as “other” for the variable company.

top_four <- A.company %>% group_by(company) %>% summarise(count = n()) %>%  top_n(4, count) %>% pull(company)

A.company$company[!A.company$company %in% top_four] <- "other"

head(A.company)
##   Store company Store.Status            Address        City State Zip.Code
## 1  2656  Hy-Vee            A       300  10th St     Corning    IA    51632
## 2  2657  Hy-Vee            A          1604 Bent     Bedford    IA    50833
## 3  2674  Hy-Vee            A      720 East Main      Lamoni    IA    50140
## 4  2835   other            A 2302 West First St Cedar Falls    IA    50613
## 5  2954   other            I     3425 Ingersoll  Des Moines    IA    50300
## 6  3013   other            A    207 E Locust St  Bloomfield    IA    52537
##                                                        Store.Address
## 1            300 10th St\nCorning, IA 51632\n(40.991861, -94.731809)
## 2              1604 Bent\nBedford, IA 50833\n(40.676171, -94.725578)
## 3           720 East Main\nLamoni, IA 50140\n(40.623647, -93.924475)
## 4 2302 West First St\nCedar Falls, IA 50613\n(42.539874, -92.472778)
## 5      3425 Ingersoll\nDes Moines, IA 50300\n(41.586313, -93.663337)
## 6     207 E Locust St\nBloomfield, IA 52537\n(40.752691, -92.412847)
##   Report.Date Inactive
## 1  10/01/2018    FALSE
## 2  10/01/2018    FALSE
## 3  10/01/2018    FALSE
## 4  10/01/2018    FALSE
## 5  10/01/2018     TRUE
## 6  10/01/2018    FALSE
  • Create a new data set only include the stores from those four companies and the cities with no less than 40 active stores. Order the level of the companies by their total number of stores.
store_counts <- A.company %>% group_by(company, City) %>% summarise(count = n())
## `summarise()` has grouped output by 'company'. You can override using the
## `.groups` argument.
top_four <- store_counts %>% group_by(company) %>% summarise(total_count = sum(count)) %>% top_n(4, total_count) %>% pull(company)

store_counts_filtered <- store_counts %>% filter(company %in% top_four)

store_counts_filtered <- store_counts_filtered %>% group_by(City) %>% summarise(total_count = sum(count)) %>% filter(total_count >= 40) %>% pull(City)
data_filtered <- A.company %>% filter(City %in% store_counts_filtered)

data_filtered$company <- factor(data_filtered$company, levels = top_four[order(store_counts %>% filter(company %in% top_four) %>% group_by(company) %>% summarise(total_count = sum(count))  %>% pull(total_count))])
  • Using the new data set, make a bar chart to compare the number of stores for those four companies for the cities with no less than 40 active stores. Please use different color to identify the four companies, and use the dodge option for the bar chart.
ggplot(data_filtered, aes(x = City, y = ..count.., fill = company)) +
  geom_bar(position = "dodge", color = "black") 
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Question #4 (15 points)

  • Use the following code to extract the lattitude and longtitude from the variable Store.Address.
A.location1 = A.company %>% separate(Store.Address, into = c("Address", "Location"), sep = "\n\\(")
A.location2 = A.location1 %>% separate(Location, into = c("Latitude", "Longitude"), sep = ",") %>% select(-Address)
A.location = A.location2 %>% separate(Longitude, into = c("Longitude", "drop"), sep = "\\)") %>% select(-drop)
A.location1 = A.company %>% separate(Store.Address, into = c("Address", "Location"), sep = "\n\\(")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 85 rows [12, 34, 69, 72,
## 86, 145, 151, 186, 198, 209, 247, 255, 343, 377, 379, 392, 409, 435, 464, 465,
## ...].
A.location2 = A.location1 %>% separate(Location, into = c("Latitude", "Longitude"), sep = ",") %>% select(-Address)
A.location = A.location2 %>% separate(Longitude, into = c("Longitude", "drop"), sep = "\\)") %>% select(-drop)

Use ggplot2, draw a scatter plot of the location of all active liquor stores in Iowa by the lattitude and longtitude. Use different colors for the variable company. Note that we may need to change the format of lattitude and longtitude.

#A.location$Latitude <- as.numeric(A.location$Latitude)
#A.location$Longitude <- as.numeric(A.location$Longitude)

ggplot(A.location, aes(x = Longitude, y = Latitude, color = company)) +
  geom_point(size=1) +  labs(x = "Longitude", y = "Latitude", color = "Company")

A.location$Latitude <- as.numeric(A.location$Latitude)
A.location$Longitude <- as.numeric(A.location$Longitude)

ggplot(A.location, aes(x = Longitude, y = Latitude, color = company)) +
  geom_point(size=1) +  labs(x = "Longitude", y = "Latitude", color = "Company")
## Warning: Removed 85 rows containing missing values (`geom_point()`).

Now, we use the liquor sale data Story_County_Liquor_Sales_New.csv. Note that: Don’t knit your Rmarkdown file often. It will be slow since this dataset is large. You can run each code chuck or each line separately once the dataset is read into the environment. You can knit after everything is ready.

Question #5 (25 points)

liquor_sales = read.csv("Story_County_Liquor_Sales_New.csv")
str(liquor_sales)
## 'data.frame':    479752 obs. of  19 variables:
##  $ Invoice.Item.Number  : chr  "S29199000001" "S08900500035" "S06459500039" "S04846600029" ...
##  $ Date                 : chr  "11/23/2015" "11/13/2012" "07/09/2012" "04/03/2012" ...
##  $ Store.Number         : int  2501 4509 4105 4509 2500 3943 4102 4105 2501 2500 ...
##  $ Store.Name           : chr  "Hy-vee  #2 / Ames" "A J'S LIQUOR II" "Fareway Stores #426 / Nevada" "A J'S LIQUOR II" ...
##  $ Address              : chr  "640 LINCOLNWAY" "2515 CHAMBERLAIN" "1505 SOUTH B AVE" "2515 CHAMBERLAIN" ...
##  $ City                 : chr  "AMES" "AMES" "NEVADA" "AMES" ...
##  $ Zip.Code             : int  50010 50010 50201 50010 50010 50201 50010 50201 50010 50010 ...
##  $ County.Number        : int  85 85 85 85 85 85 85 85 85 85 ...
##  $ County               : chr  "Story" "Story" "Story" "Story" ...
##  $ Category             : int  NA 1081900 1041100 1032080 1012200 1081600 1011100 1071100 1022100 1012100 ...
##  $ Category.Name        : chr  "" "MISC. AMERICAN CORDIALS & LIQUEURS" "AMERICAN DRY GINS" "IMPORTED VODKA" ...
##  $ Vendor.Number        : int  255 322 434 35 260 260 260 55 395 55 ...
##  $ Vendor.Name          : chr  "Wilson Daniels Ltd." "Prestige Wine and Spirits Group" "Luxco-St Louis" "Bacardi U.S.A., Inc." ...
##  $ Item.Number          : int  297 75210 31658 34433 6196 67267 25608 57148 89230 11936 ...
##  $ Item.Description     : chr  "Templeton Rye w/Flask" "Kinky Liqueur" "Paramount Gin" "Grey Goose Vodka" ...
##  $ Pack                 : int  6 6 6 12 6 12 6 6 6 12 ...
##  $ Sale..Dollars.       : chr  "651.36" "180.00" "21.30" "323.52" ...
##  $ Volume.Sold..Liters. : chr  "18.00" "9.00" "3.50" "9.00" ...
##  $ Volume.Sold..Gallons.: num  4.755 2.378 0.925 2.378 0.198 ...
head(liquor_sales,20)
##    Invoice.Item.Number       Date Store.Number                   Store.Name
## 1         S29199000001 11/23/2015         2501            Hy-vee  #2 / Ames
## 2         S08900500035 11/13/2012         4509              A J'S LIQUOR II
## 3         S06459500039 07/09/2012         4105 Fareway Stores #426 / Nevada
## 4         S04846600029 04/03/2012         4509              A J'S LIQUOR II
## 5         S26899700012 07/22/2015         2500  Hy-Vee Food Store #1 / Ames
## 6         S13555400025 07/24/2013         3943            Good and Quick Co
## 7         S15680400008 11/12/2013         4102   Fareway Stores #386 / Ames
## 8         S12693100008 06/10/2013         4105 Fareway Stores #426 / Nevada
## 9         S20886300293 08/27/2014         2501            Hy-vee  #2 / Ames
## 10        S25751000025 05/20/2015         2500  Hy-Vee Food Store #1 / Ames
## 11        S11121300031 03/18/2013         3524       Sam's Club 6568 / Ames
## 12        S23166300009 12/24/2014         4987           Almost Always Open
## 13        S09467400066 12/13/2012         3973          Mmdg Spirits / Ames
## 14        S10667000018 02/18/2013         2500  Hy-Vee Food Store #1 / Ames
## 15        S18341400015 04/09/2014         4105 Fareway Stores #426 / Nevada
## 16        S04368600019 03/05/2012         3524       Sam's Club 6568 / Ames
## 17        S06914500003 08/02/2012         4102   Fareway Stores #386 / Ames
## 18        S28715000049 10/28/2015         4004         Wal-Mart 4256 / Ames
## 19        S21933000129 10/22/2014         2501            Hy-vee  #2 / Ames
## 20        S07254600148 08/22/2012         2501            Hy-vee  #2 / Ames
##               Address   City Zip.Code County.Number County Category
## 1      640 LINCOLNWAY   AMES    50010            85  Story       NA
## 2    2515 CHAMBERLAIN   AMES    50010            85  Story  1081900
## 3    1505 SOUTH B AVE NEVADA    50201            85  Story  1041100
## 4    2515 CHAMBERLAIN   AMES    50010            85  Story  1032080
## 5  3800 W LINCOLN WAY   AMES    50010            85  Story  1012200
## 6      519 LINCOLNWAY NEVADA    50201            85  Story  1081600
## 7     619 BURNETT AVE   AMES    50010            85  Story  1011100
## 8    1505 SOUTH B AVE NEVADA    50201            85  Story  1071100
## 9      640 LINCOLNWAY   AMES    50010            85  Story  1022100
## 10 3800 W LINCOLN WAY   AMES    50010            85  Story  1012100
## 11     305 AIRPORT RD   AMES    50010            85  Story  1081600
## 12    419 LINCOLN WAY   AMES    50010            85  Story  1031080
## 13     126A WELCH AVE   AMES    50014            85  Story  1032200
## 14 3800 W LINCOLN WAY   AMES    50010            85  Story  1081390
## 15   1505 SOUTH B AVE NEVADA    50201            85  Story  1032080
## 16     305 AIRPORT RD   AMES    50010            85  Story  1011200
## 17    619 BURNETT AVE   AMES    50010            85  Story  1062310
## 18     534 S DUFF AVE   AMES    50010            85  Story  1062200
## 19     640 LINCOLNWAY   AMES    50010            85  Story  1031200
## 20     640 LINCOLNWAY   AMES    50010            85  Story  1012100
##                         Category.Name Vendor.Number
## 1                                               255
## 2  MISC. AMERICAN CORDIALS & LIQUEURS           322
## 3                   AMERICAN DRY GINS           434
## 4                      IMPORTED VODKA            35
## 5                     SCOTCH WHISKIES           260
## 6                     WHISKEY LIQUEUR           260
## 7                    BLENDED WHISKIES           260
## 8                  AMERICAN COCKTAILS            55
## 9                             TEQUILA           395
## 10                  CANADIAN WHISKIES            55
## 11                    WHISKEY LIQUEUR           259
## 12                     VODKA 80 PROOF           205
## 13              IMPORTED VODKA - MISC           260
## 14                  IMPORTED SCHNAPPS           421
## 15                     IMPORTED VODKA           370
## 16          STRAIGHT BOURBON WHISKIES           421
## 17                         SPICED RUM           260
## 18   PUERTO RICO & VIRGIN ISLANDS RUM           260
## 19                     VODKA FLAVORED           421
## 20                  CANADIAN WHISKIES            65
##                         Vendor.Name Item.Number
## 1               Wilson Daniels Ltd.         297
## 2   Prestige Wine and Spirits Group       75210
## 3                    Luxco-St Louis       31658
## 4              Bacardi U.S.A., Inc.       34433
## 5                   Diageo Americas        6196
## 6                   Diageo Americas       67267
## 7                   Diageo Americas       25608
## 8             Sazerac North America       57148
## 9                           Proximo       89230
## 10            Sazerac North America       11936
## 11               Heaven Hill Brands       86796
## 12             E AND J GALLO WINERY       36968
## 13                  Diageo Americas       64755
## 14                Sazerac Co., Inc.       69611
## 15 Pernod Ricard USA/Austin Nichols       34006
## 16                Sazerac Co., Inc.       16518
## 17                  Diageo Americas       43336
## 18                  Diageo Americas       43330
## 19                Sazerac Co., Inc.       41146
## 20                  Jim Beam Brands       15244
##                           Item.Description Pack Sale..Dollars.
## 1                    Templeton Rye w/Flask    6         651.36
## 2                            Kinky Liqueur    6         180.00
## 3                            Paramount Gin    6          21.30
## 4                         Grey Goose Vodka   12         323.52
## 5                           Talisker Storm    6          51.74
## 6              Yukon Jack Canadian Liqueur   12          35.24
## 7              Seagrams 7 Crown Bl Whiskey    6         107.58
## 8            Chi-Chi's Margarita W/tequila    6         110.88
## 9                       1800 Anejo Tequila    6          58.50
## 10    Canadian Ltd Whisky Convenience Pack   12          19.53
## 11           Southern Host Whiskey Liqueur   12          90.60
## 12                     New Amsterdam Vodka   24          31.50
## 13                         Ciroc Red Berry   12          27.74
## 14   Dr. Mcgillicuddy's Apple Pie Schnapps   12         149.76
## 15            Absolut Swedish Vodka 80 Prf   12          34.48
## 16                     Ancient Age Bourbon    6         202.44
## 17          Captain Morgan Original Spiced   12         152.88
## 18                    Captain Morgan White    6         162.00
## 19 Sweet Carolina Sweet Tea Flavored Vodka   12          43.64
## 20         Windsor Canadian Supreme Whisky   24          26.04
##    Volume.Sold..Liters. Volume.Sold..Gallons.
## 1                 18.00             4.7550969
## 2                  9.00             2.3775485
## 3                  3.50             0.9246022
## 4                  9.00             2.3775485
## 5                  0.75             0.1981290
## 6                  2.00             0.5283441
## 7                 10.50             2.7738065
## 8                 21.00             5.5476131
## 9                  1.50             0.3962581
## 10                 2.25             0.5943871
## 11                 9.00             2.3775485
## 12                 2.25             0.5943871
## 13                 0.75             0.1981290
## 14                 9.00             2.3775485
## 15                 1.50             0.3962581
## 16                21.00             5.5476131
## 17                 9.00             2.3775485
## 18                10.50             2.7738065
## 19                 3.00             0.7925162
## 20                 2.25             0.5943871
  • Convert Volume.Sold..Liters. and Sale..Dollars. to numeric variables (Be careful to convert factor to numeric, first convert factor to character and then character to numeric.)
liquor_sales$Volume.Sold..Liters. <- as.numeric(as.character(liquor_sales$Volume.Sold..Liters.,na.rm = TRUE))
## Warning: NAs introduced by coercion
liquor_sales$Sale..Dollars. <- as.numeric(as.character(liquor_sales$Sale..Dollars.,na.rm = TRUE))
## Warning: NAs introduced by coercion
  • Only consider the data from Ames. Check the variable City. Several of them should be same city but regarded as different levels in the data set. Fix them.
AmesSale <- liquor_sales %>% filter(City %in% c("Ames", "AMES")) %>%
  mutate(City = "Ames")
head(AmesSale)
##   Invoice.Item.Number       Date Store.Number                  Store.Name
## 1        S29199000001 11/23/2015         2501           Hy-vee  #2 / Ames
## 2        S08900500035 11/13/2012         4509             A J'S LIQUOR II
## 3        S04846600029 04/03/2012         4509             A J'S LIQUOR II
## 4        S26899700012 07/22/2015         2500 Hy-Vee Food Store #1 / Ames
## 5        S15680400008 11/12/2013         4102  Fareway Stores #386 / Ames
## 6        S20886300293 08/27/2014         2501           Hy-vee  #2 / Ames
##              Address City Zip.Code County.Number County Category
## 1     640 LINCOLNWAY Ames    50010            85  Story       NA
## 2   2515 CHAMBERLAIN Ames    50010            85  Story  1081900
## 3   2515 CHAMBERLAIN Ames    50010            85  Story  1032080
## 4 3800 W LINCOLN WAY Ames    50010            85  Story  1012200
## 5    619 BURNETT AVE Ames    50010            85  Story  1011100
## 6     640 LINCOLNWAY Ames    50010            85  Story  1022100
##                        Category.Name Vendor.Number
## 1                                              255
## 2 MISC. AMERICAN CORDIALS & LIQUEURS           322
## 3                     IMPORTED VODKA            35
## 4                    SCOTCH WHISKIES           260
## 5                   BLENDED WHISKIES           260
## 6                            TEQUILA           395
##                       Vendor.Name Item.Number            Item.Description Pack
## 1             Wilson Daniels Ltd.         297       Templeton Rye w/Flask    6
## 2 Prestige Wine and Spirits Group       75210               Kinky Liqueur    6
## 3            Bacardi U.S.A., Inc.       34433            Grey Goose Vodka   12
## 4                 Diageo Americas        6196              Talisker Storm    6
## 5                 Diageo Americas       25608 Seagrams 7 Crown Bl Whiskey    6
## 6                         Proximo       89230          1800 Anejo Tequila    6
##   Sale..Dollars. Volume.Sold..Liters. Volume.Sold..Gallons.
## 1         651.36                18.00             4.7550969
## 2         180.00                 9.00             2.3775485
## 3         323.52                 9.00             2.3775485
## 4          51.74                 0.75             0.1981290
## 5         107.58                10.50             2.7738065
## 6          58.50                 1.50             0.3962581
  • Extract year, month and day from the variable Date.
AmesSale <- AmesSale %>% separate(Date, into = c("month", "day", "year"), sep = "/")

#mutate(Date = ymd(Date),Year = year(Date), Month = month(Date),Day = day(Date))
#liquor_sales  = liquor_sales  %>%  separate(Date, into = c("month", "day", "year"), sep = "/")
head(AmesSale)
##   Invoice.Item.Number month day year Store.Number                  Store.Name
## 1        S29199000001    11  23 2015         2501           Hy-vee  #2 / Ames
## 2        S08900500035    11  13 2012         4509             A J'S LIQUOR II
## 3        S04846600029    04  03 2012         4509             A J'S LIQUOR II
## 4        S26899700012    07  22 2015         2500 Hy-Vee Food Store #1 / Ames
## 5        S15680400008    11  12 2013         4102  Fareway Stores #386 / Ames
## 6        S20886300293    08  27 2014         2501           Hy-vee  #2 / Ames
##              Address City Zip.Code County.Number County Category
## 1     640 LINCOLNWAY Ames    50010            85  Story       NA
## 2   2515 CHAMBERLAIN Ames    50010            85  Story  1081900
## 3   2515 CHAMBERLAIN Ames    50010            85  Story  1032080
## 4 3800 W LINCOLN WAY Ames    50010            85  Story  1012200
## 5    619 BURNETT AVE Ames    50010            85  Story  1011100
## 6     640 LINCOLNWAY Ames    50010            85  Story  1022100
##                        Category.Name Vendor.Number
## 1                                              255
## 2 MISC. AMERICAN CORDIALS & LIQUEURS           322
## 3                     IMPORTED VODKA            35
## 4                    SCOTCH WHISKIES           260
## 5                   BLENDED WHISKIES           260
## 6                            TEQUILA           395
##                       Vendor.Name Item.Number            Item.Description Pack
## 1             Wilson Daniels Ltd.         297       Templeton Rye w/Flask    6
## 2 Prestige Wine and Spirits Group       75210               Kinky Liqueur    6
## 3            Bacardi U.S.A., Inc.       34433            Grey Goose Vodka   12
## 4                 Diageo Americas        6196              Talisker Storm    6
## 5                 Diageo Americas       25608 Seagrams 7 Crown Bl Whiskey    6
## 6                         Proximo       89230          1800 Anejo Tequila    6
##   Sale..Dollars. Volume.Sold..Liters. Volume.Sold..Gallons.
## 1         651.36                18.00             4.7550969
## 2         180.00                 9.00             2.3775485
## 3         323.52                 9.00             2.3775485
## 4          51.74                 0.75             0.1981290
## 5         107.58                10.50             2.7738065
## 6          58.50                 1.50             0.3962581
  • Use the group_by() and summarise() functions in dplyr package to obtain a summarized data set of total Volume.Sold..Gallons. and Sale..Dollars. for different combinations of month and year.
  sales_summary <- AmesSale %>% 
  group_by(year, month) %>% 
  summarise(Total_Volume_Sold = sum(Volume.Sold..Liters.)*0.264172, 
            Total_Sales = sum(Sale..Dollars.))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
  • Draw a scatter plot for the trend of total sale over month. Please separate the trend for different years. You need to use color option in ggplot2.
sales_summary%>% ggplot(aes(x = month, y = Total_Sales, color = as.factor(year))) + 
  geom_point() 
## Warning: Removed 81 rows containing missing values (`geom_point()`).