Wine dataset: https://www.kaggle.com/datasets/elvinrustam/wine-dataset
winedata <- read.table("./WineDataset.csv", header = TRUE, sep = ",", dec = ",", quote = "\"")
head(winedata)
## Title
## 1 The Guv'nor, Spain
## 2 Bread & Butter 'Winemaker's Selection' Chardonnay 2020/21, California
## 3 Oyster Bay Sauvignon Blanc 2022, Marlborough
## 4 Louis Latour Mâcon-Lugny 2021/22
## 5 Bread & Butter 'Winemaker's Selection' Pinot Noir 2021, California
## 6 Louis Roederer 'Cristal' Champagne 2015
## Description
## 1 We asked some of our most prized winemakers working in Spain to make the best wines possible – no rules, no restrictions, no red tape. The Guv'nor collection was their answer. Made from Tempranillo grapes picked from their favourite vineyards across Spain, it’s a crowd-pleasing wine designed to be shared with good food – and even better company. It’s bold. It’s fruity. It’s a modern Spanish red that goes with everything, especially spicy barbecue meats or anything chargrilled.
## 2 This really does what it says on the tin. It’s a lush Chardonnay which tastes of – you guessed it – bread and butter. The winemaking team combine grapes from two very different regions to create something full but fresh. Monterey is known for its big Chardonnays, but Carneros is far cooler, so flavours are more understated. And you'll taste the benefits in this bold and buttery wine. It’s silky smooth with fresh notes of citrus and vanilla. It’s incredibly versatile, but we like it best with creamy tarragon chicken or soft, creamy cheese.
## 3 Oyster Bay has been an award-winning gold-standard Marlborough Sauvignon since its very first vintage, and it’s only got better and better. It was even formerly named the ‘Best Sauvignon Blanc in the World’ – and they have the name recognition to prove it. It’s really no surprise then that their Sauvignon is textbook Marlborough. Big. Fruity. Tropical. To say it’s mouth-wateringly refreshing is an understatement – the smell of gooseberries and lime will hit you the minute you unscrew the bottle. It’s really versatile – delicious as a glassful on sofa, for parties or with a flaky goat’s cheese tart.\n\nIn 2022, this wine won an IWC award for the 2021 vintage.
## 4 We’ve sold this wine for thirty years – and for that whole time it’s been a bestseller. Why’s it so popular? White Burgundy is a go-to for Chardonnay drinkers, and the Mâconnais in particular offers great value. Unlike the northern Côtes, the southern part of Burgundy doesn’t have a formalised vineyard hierarchy system. So the wines tend to be a bit more wallet-friendly. Not to mention, Lugny is one of the best regarded villages in the Mâcon, and Latour one of the most famous houses. Honeyed. Peachy. Crisp. A dream with salty finger foods or a classic Sunday roast chicken. It’s proof that sometimes it pays to stick with the old favourites.
## 5 Bread & Butter is that thing that you can count on. It’s dependable, comforting – you know what to expect. That’s exactly what you’ll get with this wine, a textbook Californian Pinot Noir from the same winery which brought you the sell-out Bread & Butter Chardonnay. Their vineyards are ideally located for classic varieties like Pinot. It’s sun-drenched, but cooled by the daily morning mists which roll in off the Pacific. It’s exactly the sort of position that pernickety Pinot thrives in. This deliciously silky wine proves our point. It’s all about red fruit. Cherries. Raspberries. Cranberries. It’s seriously juicy and seriously delicious, with a nice fullness which is classic California. Oak ageing gives it a touch of cedary spice, and woody herb notes. Perfect with roast mushrooms.
## 6 Cristal is Louis Roederer’s flagship wine. It was first made in 1876, when Tsar Alexander II asked the house to reserve their best cuvee for him every year. But he had another request. Fearful of an assassination attempt, he asked that it be produced in a clear bottle with no indent, called a ‘punt’, in the bottom. And an icon was born. It’s made from seven Grand Crus and only in the best years, when low-yielding Pinot Noir and Chardonnay vines produce perfectly ripe grapes. Rich and velvety, it has intense and complex notes of ripe citrus, golden apple and almond pastry with an incredibly long and elegant finish.
## Price Capacity Grape Secondary.Grape.Varieties
## 1 £9.99 per bottle 75CL Tempranillo
## 2 £15.99 per bottle 75CL Chardonnay
## 3 £12.49 per bottle 75CL Sauvignon Blanc
## 4 £17.99 per bottle 75CL Chardonnay
## 5 £15.99 per bottle 75CL Pinot Noir
## 6 £300.00 per bottle 75CL Chardonnay
## Closure Country Unit
## 1 Natural Cork Spain 10.5
## 2 Natural Cork USA 10.1
## 3 Screwcap New Zealand 9.8
## 4 Natural Cork France 10.1
## 5 Natural Cork USA 10.1
## 6 Natural Cork France 9
## Characteristics
## 1 Vanilla, Blackberry, Blackcurrant
## 2 Vanilla, Almond, Coconut, Green Apple, Peach, Pineapple, Stone Fruit
## 3 Tropical Fruit, Gooseberry, Grapefruit, Grass, Green Apple, Lemon, Stone Fruit
## 4 Peach, Apricot, Floral, Lemon
## 5 Smoke, Black Cherry, Cedar, Raspberry, Red Fruit
## 6
## Per.bottle...case...each Type ABV Region Style Vintage
## 1 per bottle Red ABV 14.00% Rich & Juicy NV
## 2 per bottle White ABV 13.50% California Rich & Toasty 2021
## 3 per bottle White ABV 13.00% Marlborough Crisp & Zesty 2022
## 4 per bottle White ABV 13.50% Burgundy Ripe & Rounded 2022
## 5 per bottle Red ABV 13.50% California Smooth & Mellow 2021
## 6 per bottle White ABV 12.00% 2015
## Appellation
## 1
## 2 Napa Valley
## 3
## 4 Macon
## 5 Napa Valley
## 6
Unit: 750 ml Wine product
Let’s assume that it’s a simple random sampling collected around the world.
Research question: Are the prices different for different types of bottles closure?
Closures can be: Natural Cork, Screwcap, Synthetic Cork, Vinolok (nominal variable).
Price is numeric, in pounds.
winedata <- winedata[,c(3,4,5,7,8)] #Leaving just several columns
library(readr)
winedata$Price <- readr::parse_number(winedata$Price, locale = readr::locale(decimal_mark = "."))
winedata <- winedata[winedata$Capacity == '75CL', ] # Picking only 750ml bottles
winedata <- winedata[winedata$Closure != '', ] # Removing units with empty closure
head(winedata)
## Price Capacity Grape Closure Country
## 1 9.99 75CL Tempranillo Natural Cork Spain
## 2 15.99 75CL Chardonnay Natural Cork USA
## 3 12.49 75CL Sauvignon Blanc Screwcap New Zealand
## 4 17.99 75CL Chardonnay Natural Cork France
## 5 15.99 75CL Pinot Noir Natural Cork USA
## 6 300.00 75CL Chardonnay Natural Cork France
library(pastecs)
round(stat.desc(winedata[,1]), digits=2)
## nbr.val nbr.null nbr.na min max range
## 1193.00 0.00 0.00 4.99 430.00 425.01
## sum median mean SE.mean CI.mean.0.95 var
## 33164.48 16.99 27.80 1.03 2.02 1265.82
## std.dev coef.var
## 35.58 1.28
We have 1193 units which costs from 4.99 to 430 pounds. Median price is 16.99 pounds, arithmetic mean is 27.80. Standard error of arithmetic mean estimate is 1.03. We can say that there’s 95% probability that true arithmetic mean of price of population (all 750ml wine products in the world) is between 25.78 and 29.82 pounds.
library(psych)
## Warning: package 'psych' was built under R version 4.3.2
describeBy(winedata$Price, group = winedata$Closure)
##
## Descriptive statistics by group
## group: Natural Cork
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 759 34.76 42.46 19.99 26.15 11.86 6.99 430 423.01 4.55 26 1.54
## ------------------------------------------------------------
## group: Screwcap
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 407 15.68 10.06 12.99 13.8 4.45 4.99 90 85.01 3.55 17.99 0.5
## ------------------------------------------------------------
## group: Synthetic Cork
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 21 14.23 3.95 13.99 13.93 4.45 8.99 24.99 16 0.67 0.38 0.86
## ------------------------------------------------------------
## group: Vinolok
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 6 16.82 2.86 15.99 16.82 2.22 13.99 21.99 8 0.78 -1.02 1.17
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
ggplot(winedata, aes(x = Price)) +
geom_histogram(binwidth = 1, colour='gray') +
facet_wrap(~Closure, ncol = 1) +
ylab("Frequency")
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.3.2
ggboxplot(winedata, x = "Closure", y = "Price", add = "jitter")
It looks as there’s one very expensive outlier.
library(plyr)
## Warning: package 'plyr' was built under R version 4.3.2
##
## Attaching package: 'plyr'
## The following object is masked from 'package:ggpubr':
##
## mutate
head(arrange(winedata[winedata$Closure == "Natural Cork",],desc(Price)), n = 10)
## Price Capacity Grape Closure Country
## 1 430 75CL Chardonnay Natural Cork France
## 2 315 75CL Chardonnay Natural Cork France
## 3 310 75CL Pinot Noir Natural Cork France
## 4 300 75CL Chardonnay Natural Cork France
## 5 300 75CL Pinot Noir Natural Cork France
## 6 300 75CL Pinot Noir Natural Cork France
## 7 280 75CL Chardonnay Natural Cork France
## 8 270 75CL Cabernet Sauvignon Natural Cork Italy
## 9 265 75CL Chardonnay Natural Cork France
## 10 250 75CL Chardonnay Natural Cork France
winedata <- winedata[winedata$Price != 430,] #Removing this French outlier
library(rstatix)
## Warning: package 'rstatix' was built under R version 4.3.2
##
## Attaching package: 'rstatix'
## The following objects are masked from 'package:plyr':
##
## desc, mutate
## The following object is masked from 'package:stats':
##
## filter
winedata %>% group_by(Closure) %>% shapiro_test(Price) #Normality test within the groups
## # A tibble: 4 × 4
## Closure variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 Natural Cork Price 0.539 1.76e-40
## 2 Screwcap Price 0.663 1.34e-27
## 3 Synthetic Cork Price 0.933 1.61e- 1
## 4 Vinolok Price 0.877 2.58e- 1
Normality is not met for prices of wines with Synthetic Cork and Vinolok. For them we reject the H0 (that price has normal distribution)
library(car)
## Warning: package 'car' was built under R version 4.3.2
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
leveneTest(winedata$Price, group = winedata$Closure)
## Warning in leveneTest.default(winedata$Price, group = winedata$Closure):
## winedata$Closure coerced to factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 19.529 2.314e-12 ***
## 1188
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
H0: All variances of price within all closures are the same
H1: At least one variance differs
Here we reject the H0 (no Heteroskedasticity) at p-value < 0.001. If the normality were met we would use Welch’s F-test.
library(onewaytests)
## Warning: package 'onewaytests' was built under R version 4.3.2
##
## Attaching package: 'onewaytests'
## The following object is masked from 'package:psych':
##
## describe
welch.test(Price ~ Closure, data = winedata)
##
## Welch's Heteroscedastic F Test (alpha = 0.05)
## -------------------------------------------------------------
## data : Price and Closure
##
## statistic : 50.16979
## num df : 3
## denom df : 26.18773
## p.value : 5.425083e-11
##
## Result : Difference is statistically significant.
## -------------------------------------------------------------
H0: THe average prices for all 4 types of closure is the same
H1: At least one is different different
We would reject the H0 at p-value < 0.001 if the normality was met.
kruskal.test(Price ~ Closure, data = winedata)
##
## Kruskal-Wallis rank sum test
##
## data: Price by Closure
## Kruskal-Wallis chi-squared = 242.61, df = 3, p-value < 2.2e-16
H0: Location distributions of price for all types of closures are the same
H1: At least one is different
We reject the H0 at p-value < 0.001
kruskal_effsize(Price ~ Closure, data = winedata)
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 Price 1192 0.202 eta2[H] large
Effect size is large.
library(rstatix)
closures_nonpar <- wilcox_test(Price ~ Closure, paired = FALSE, p.adjust.method = "bonferroni", data = winedata)
closures_nonpar
## # A tibble: 6 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 Price Natural Cork Screw… 758 407 237795 1.13e-52 6.78e-52 ****
## 2 Price Natural Cork Synth… 758 21 12377 1.38e- 5 8.28e- 5 ****
## 3 Price Natural Cork Vinol… 758 6 2996 1.8 e- 1 1 e+ 0 ns
## 4 Price Screwcap Synth… 407 21 3960 5.7 e- 1 1 e+ 0 ns
## 5 Price Screwcap Vinol… 407 6 695 6.9 e- 2 4.16e- 1 ns
## 6 Price Synthetic C… Vinol… 21 6 36 1.21e- 1 7.26e- 1 ns
H0: Location distributions of the price between group 1 and group2 are the same
H1: They are different
For example, we reject the H0 for Natural cork closure and Screwcap at p-value < 0.001
And we can’t reject H0 for Natural Cork and Vinolok at p-value > 0.05
Conclusion:
The prices of 0.75l bottles of wine are not the same for 4 types of closures (p < 0.001). The effect size is large.