Project: Hypothesis -Do higher priced liquor products generate more revenue despite selling fewer bottles, and do consumers rate higher priced products better?
My name is Olivia, and I am a senior studying Marketing and Business Analytics & Information Systems. After graduation, I will be joining 84.51° as a consultant and analyst. The spirits industry has grown significantly in recent years, with consumers becoming increasingly selective about the brands and styles they purchase. This analysis focuses on two of the most popular spirit categories (whiskey and vodka) and explores what factors contribute to higher ratings among enthusiasts. Specifically, I am interested in whether the type of spirit, its origin, or its price point influences how it is rated on platforms like Distiller, one of the most popular spirits review websites.
As someone interested in consumer behavior and the beverage industry, I find it compelling that spirits can vary so dramatically in price while sometimes receiving similar ratings to much cheaper alternatives. This raises the question: do consumers actually rate more expensive spirits higher, or is price simply a marketing tool? Using data scraped from Distiller’s curated lists of top-rated whiskeys and vodkas, this analysis aims to describe the rating landscape across both categories and identify any patterns that emerge between spirit characteristics and user ratings.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 12591077 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Invoice/Item Number, Date, Store Name, Address, City, Store Locati...
dbl (9): Store Number, Zip Code, Category, Item Number, Pack, Bottle Volume...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): vodka_name, vodka_description
dbl (1): vodka_rating
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): whiskey_name, whiskey_description
dbl (1): whiskey_rating
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Summary Stats for Iowa Liquor Dataset
summary(iowa_liquor)
Invoice/Item Number Date Store Number Store Name
Length:12591077 Length:12591077 Min. :2106 Length:12591077
Class :character Class :character 1st Qu.:2604 Class :character
Mode :character Mode :character Median :3704 Mode :character
Mean :3565
3rd Qu.:4304
Max. :9932
Address City Zip Code Store Location
Length:12591077 Length:12591077 Min. :50002 Length:12591077
Class :character Class :character 1st Qu.:50316 Class :character
Mode :character Mode :character Median :51101 Mode :character
Mean :51268
3rd Qu.:52310
Max. :56201
NA's :10360
County Number County Category Category Name
Length:12591077 Length:12591077 Min. : 101220 Length:12591077
Class :character Class :character 1st Qu.:1012210 Class :character
Mode :character Mode :character Median :1031200 Mode :character
Mean :1044710
3rd Qu.:1062310
Max. :1901200
NA's :8020
Vendor Number Vendor Name Item Number Item Description
Length:12591077 Length:12591077 Min. : 101 Length:12591077
Class :character Class :character 1st Qu.: 27056 Class :character
Mode :character Mode :character Median : 38177 Mode :character
Mean : 46037
3rd Qu.: 63755
Max. :999275
Pack Bottle Volume (ml) State Bottle Cost State Bottle Retail
Min. : 1.00 Min. : 0.0 Length:12591077 Length:12591077
1st Qu.: 6.00 1st Qu.: 750.0 Class :character Class :character
Median : 12.00 Median : 750.0 Mode :character Mode :character
Mean : 12.23 Mean : 928.9
3rd Qu.: 12.00 3rd Qu.: 1000.0
Max. :600.00 Max. :378000.0
Bottles Sold Sale (Dollars) Volume Sold (Liters)
Min. : 0.00 Length:12591077 Min. : 0.000
1st Qu.: 2.00 Class :character 1st Qu.: 1.500
Median : 4.00 Mode :character Median : 3.000
Mean : 8.14 Mean : 7.489
3rd Qu.: 12.00 3rd Qu.: 9.000
Max. :15000.00 Max. :15000.000
Volume Sold (Gallons)
Min. : 0.000
1st Qu.: 0.400
Median : 0.790
Mean : 1.977
3rd Qu.: 2.380
Max. :3962.580
When looking at whiskeys and vodkas, more expensive spirits tend to receive higher ratings. Premium whiskeys, often aged longer and produced in smaller batches, consistently score higher than budget options. The same pattern holds for vodkas, where higher-end bottles tend to outperform cheaper alternatives in user reviews.