Homework

Wine dataset: https://www.kaggle.com/datasets/elvinrustam/wine-dataset

winedata <- read.table("./WineDataset.csv", header = TRUE, sep = ",", dec = ",", quote = "\"")
head(winedata)

##                                                                   Title
## 1                                                    The Guv'nor, Spain
## 2 Bread & Butter 'Winemaker's Selection' Chardonnay 2020/21, California
## 3                          Oyster Bay Sauvignon Blanc 2022, Marlborough
## 4                                      Louis Latour Mâcon-Lugny 2021/22
## 5    Bread & Butter 'Winemaker's Selection' Pinot Noir 2021, California
## 6                               Louis Roederer 'Cristal' Champagne 2015
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Description
## 1                                                                                                                                                                                                                                                                                                                         We asked some of our most prized winemakers working in Spain to make the best wines possible – no rules, no restrictions, no red tape. The Guv'nor collection was their answer. Made from Tempranillo grapes picked from their favourite vineyards across Spain, it’s a crowd-pleasing wine designed to be shared with good food – and even better company. It’s bold. It’s fruity. It’s a modern Spanish red that goes with everything, especially spicy barbecue meats or anything chargrilled.
## 2                                                                                                                                                                                                                                                          This really does what it says on the tin. It’s a lush Chardonnay which tastes of – you guessed it – bread and butter. The winemaking team combine grapes from two very different regions to create something full but fresh. Monterey is known for its big Chardonnays, but Carneros is far cooler, so flavours are more understated. And you'll taste the benefits in this bold and buttery wine. It’s silky smooth with fresh notes of citrus and vanilla. It’s incredibly versatile, but we like it best with creamy tarragon chicken or soft, creamy cheese.
## 3                                                                                                                                Oyster Bay has been an award-winning gold-standard Marlborough Sauvignon since its very first vintage, and it’s only got better and better. It was even formerly named the ‘Best Sauvignon Blanc in the World’ – and they have the name recognition to prove it. It’s really no surprise then that their Sauvignon is textbook Marlborough. Big. Fruity. Tropical. To say it’s mouth-wateringly refreshing is an understatement – the smell of gooseberries and lime will hit you the minute you unscrew the bottle. It’s really versatile – delicious as a glassful on sofa, for parties or with a flaky goat’s cheese tart.\n\nIn 2022, this wine won an IWC award for the 2021 vintage.
## 4                                                                                                                                                   We’ve sold this wine for thirty years – and for that whole time it’s been a bestseller. Why’s it so popular? White Burgundy is a go-to for Chardonnay drinkers, and the Mâconnais in particular offers great value. Unlike the northern Côtes, the southern part of Burgundy doesn’t have a formalised vineyard hierarchy system. So the wines tend to be a bit more wallet-friendly. Not to mention, Lugny is one of the best regarded villages in the Mâcon, and Latour one of the most famous houses. Honeyed. Peachy. Crisp. A dream with salty finger foods or a classic Sunday roast chicken. It’s proof that sometimes it pays to stick with the old favourites.
## 5 Bread & Butter is that thing that you can count on. It’s dependable, comforting – you know what to expect. That’s exactly what you’ll get with this wine, a textbook Californian Pinot Noir from the same winery which brought you the sell-out Bread & Butter Chardonnay. Their vineyards are ideally located for classic varieties like Pinot. It’s sun-drenched, but cooled by the daily morning mists which roll in off the Pacific. It’s exactly the sort of position that pernickety Pinot thrives in. This deliciously silky wine proves our point. It’s all about red fruit. Cherries. Raspberries. Cranberries. It’s seriously juicy and seriously delicious, with a nice fullness which is classic California. Oak ageing gives it a touch of cedary spice, and woody herb notes. Perfect with roast mushrooms.
## 6                                                                                                                                                                             Cristal is Louis Roederer’s flagship wine. It was first made in 1876, when Tsar Alexander II asked the house to reserve their best cuvee for him every year. But he had another request. Fearful of an assassination attempt, he asked that it be produced in a clear bottle with no indent, called a ‘punt’, in the bottom. And an icon was born. It’s made from seven Grand Crus and only in the best years, when low-yielding Pinot Noir and Chardonnay vines produce perfectly ripe grapes. Rich and velvety, it has intense and complex notes of ripe citrus, golden apple and almond pastry with an incredibly long and elegant finish.
##                Price Capacity           Grape Secondary.Grape.Varieties
## 1   £9.99 per bottle     75CL     Tempranillo                          
## 2  £15.99 per bottle     75CL      Chardonnay                          
## 3  £12.49 per bottle     75CL Sauvignon Blanc                          
## 4  £17.99 per bottle     75CL      Chardonnay                          
## 5  £15.99 per bottle     75CL      Pinot Noir                          
## 6 £300.00 per bottle     75CL      Chardonnay                          
##        Closure     Country Unit
## 1 Natural Cork       Spain 10.5
## 2 Natural Cork         USA 10.1
## 3     Screwcap New Zealand  9.8
## 4 Natural Cork      France 10.1
## 5 Natural Cork         USA 10.1
## 6 Natural Cork      France    9
##                                                                  Characteristics
## 1                                              Vanilla, Blackberry, Blackcurrant
## 2           Vanilla, Almond, Coconut, Green Apple, Peach, Pineapple, Stone Fruit
## 3 Tropical Fruit, Gooseberry, Grapefruit, Grass, Green Apple, Lemon, Stone Fruit
## 4                                                  Peach, Apricot, Floral, Lemon
## 5                               Smoke, Black Cherry, Cedar, Raspberry, Red Fruit
## 6                                                                               
##   Per.bottle...case...each  Type        ABV      Region           Style Vintage
## 1               per bottle   Red ABV 14.00%                Rich & Juicy      NV
## 2               per bottle White ABV 13.50%  California   Rich & Toasty    2021
## 3               per bottle White ABV 13.00% Marlborough   Crisp & Zesty    2022
## 4               per bottle White ABV 13.50%    Burgundy  Ripe & Rounded    2022
## 5               per bottle   Red ABV 13.50%  California Smooth & Mellow    2021
## 6               per bottle White ABV 12.00%                                2015
##   Appellation
## 1            
## 2 Napa Valley
## 3            
## 4       Macon
## 5 Napa Valley
## 6

Unit: 750 ml Wine product

Let’s assume that it’s a simple random sampling collected around the world.

Research question: Are the prices different for different types of bottles closure?

Closures can be: Natural Cork, Screwcap, Synthetic Cork, Vinolok (nominal variable).

Price is numeric, in pounds.

winedata <- winedata[,c(3,4,5,7,8)] #Leaving just several columns
library(readr)
winedata$Price <- readr::parse_number(winedata$Price, locale = readr::locale(decimal_mark = "."))
winedata <- winedata[winedata$Capacity == '75CL', ] # Picking only 750ml bottles
winedata <- winedata[winedata$Closure != '', ] # Removing units with empty closure
head(winedata)

##    Price Capacity           Grape      Closure     Country
## 1   9.99     75CL     Tempranillo Natural Cork       Spain
## 2  15.99     75CL      Chardonnay Natural Cork         USA
## 3  12.49     75CL Sauvignon Blanc     Screwcap New Zealand
## 4  17.99     75CL      Chardonnay Natural Cork      France
## 5  15.99     75CL      Pinot Noir Natural Cork         USA
## 6 300.00     75CL      Chardonnay Natural Cork      France

library(pastecs)
round(stat.desc(winedata[,1]), digits=2)

##      nbr.val     nbr.null       nbr.na          min          max        range 
##      1193.00         0.00         0.00         4.99       430.00       425.01 
##          sum       median         mean      SE.mean CI.mean.0.95          var 
##     33164.48        16.99        27.80         1.03         2.02      1265.82 
##      std.dev     coef.var 
##        35.58         1.28

We have 1193 units which costs from 4.99 to 430 pounds. Median price is 16.99 pounds, arithmetic mean is 27.80. Standard error of arithmetic mean estimate is 1.03. We can say that there’s 95% probability that true arithmetic mean of price of population (all 750ml wine products in the world) is between 25.78 and 29.82 pounds.

library(psych)

## Warning: package 'psych' was built under R version 4.3.2

describeBy(winedata$Price, group = winedata$Closure)

## 
##  Descriptive statistics by group 
## group: Natural Cork
##    vars   n  mean    sd median trimmed   mad  min max  range skew kurtosis   se
## X1    1 759 34.76 42.46  19.99   26.15 11.86 6.99 430 423.01 4.55       26 1.54
## ------------------------------------------------------------ 
## group: Screwcap
##    vars   n  mean    sd median trimmed  mad  min max range skew kurtosis  se
## X1    1 407 15.68 10.06  12.99    13.8 4.45 4.99  90 85.01 3.55    17.99 0.5
## ------------------------------------------------------------ 
## group: Synthetic Cork
##    vars  n  mean   sd median trimmed  mad  min   max range skew kurtosis   se
## X1    1 21 14.23 3.95  13.99   13.93 4.45 8.99 24.99    16 0.67     0.38 0.86
## ------------------------------------------------------------ 
## group: Vinolok
##    vars n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
## X1    1 6 16.82 2.86  15.99   16.82 2.22 13.99 21.99     8 0.78    -1.02 1.17

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.2

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

ggplot(winedata, aes(x = Price)) +
  geom_histogram(binwidth = 1, colour='gray') +
  facet_wrap(~Closure, ncol = 1) +
  ylab("Frequency")

library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.3.2

ggboxplot(winedata, x = "Closure", y = "Price", add = "jitter")

It looks as there’s one very expensive outlier.

library(plyr)

## Warning: package 'plyr' was built under R version 4.3.2

## 
## Attaching package: 'plyr'

## The following object is masked from 'package:ggpubr':
## 
##     mutate

head(arrange(winedata[winedata$Closure == "Natural Cork",],desc(Price)), n = 10)

##    Price Capacity              Grape      Closure Country
## 1    430     75CL         Chardonnay Natural Cork  France
## 2    315     75CL         Chardonnay Natural Cork  France
## 3    310     75CL         Pinot Noir Natural Cork  France
## 4    300     75CL         Chardonnay Natural Cork  France
## 5    300     75CL         Pinot Noir Natural Cork  France
## 6    300     75CL         Pinot Noir Natural Cork  France
## 7    280     75CL         Chardonnay Natural Cork  France
## 8    270     75CL Cabernet Sauvignon Natural Cork   Italy
## 9    265     75CL         Chardonnay Natural Cork  France
## 10   250     75CL         Chardonnay Natural Cork  France

winedata <- winedata[winedata$Price != 430,] #Removing this French outlier
library(rstatix)

## Warning: package 'rstatix' was built under R version 4.3.2

## 
## Attaching package: 'rstatix'

## The following objects are masked from 'package:plyr':
## 
##     desc, mutate

## The following object is masked from 'package:stats':
## 
##     filter

winedata %>% group_by(Closure) %>% shapiro_test(Price) #Normality test within the groups

## # A tibble: 4 × 4
##   Closure        variable statistic        p
##   <chr>          <chr>        <dbl>    <dbl>
## 1 Natural Cork   Price        0.539 1.76e-40
## 2 Screwcap       Price        0.663 1.34e-27
## 3 Synthetic Cork Price        0.933 1.61e- 1
## 4 Vinolok        Price        0.877 2.58e- 1

Normality is not met for prices of wines with Synthetic Cork and Vinolok. For them we reject the H0 (that price has normal distribution)

library(car)

## Warning: package 'car' was built under R version 4.3.2

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

leveneTest(winedata$Price, group = winedata$Closure)

## Warning in leveneTest.default(winedata$Price, group = winedata$Closure):
## winedata$Closure coerced to factor.

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  19.529 2.314e-12 ***
##       1188                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

H0: All variances of price within all closures are the same

H1: At least one variance differs

Here we reject the H0 (no Heteroskedasticity) at p-value < 0.001. If the normality were met we would use Welch’s F-test.

library(onewaytests)

## Warning: package 'onewaytests' was built under R version 4.3.2

## 
## Attaching package: 'onewaytests'

## The following object is masked from 'package:psych':
## 
##     describe

welch.test(Price ~ Closure, data = winedata)

## 
##   Welch's Heteroscedastic F Test (alpha = 0.05) 
## ------------------------------------------------------------- 
##   data : Price and Closure 
## 
##   statistic  : 50.16979 
##   num df     : 3 
##   denom df   : 26.18773 
##   p.value    : 5.425083e-11 
## 
##   Result     : Difference is statistically significant. 
## -------------------------------------------------------------

H0: THe average prices for all 4 types of closure is the same

H1: At least one is different different

We would reject the H0 at p-value < 0.001 if the normality was met.

kruskal.test(Price ~ Closure, data = winedata)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Price by Closure
## Kruskal-Wallis chi-squared = 242.61, df = 3, p-value < 2.2e-16

H0: Location distributions of price for all types of closures are the same

H1: At least one is different

We reject the H0 at p-value < 0.001

kruskal_effsize(Price ~ Closure, data = winedata)

## # A tibble: 1 × 5
##   .y.       n effsize method  magnitude
## * <chr> <int>   <dbl> <chr>   <ord>    
## 1 Price  1192   0.202 eta2[H] large

Effect size is large.

library(rstatix)
closures_nonpar <- wilcox_test(Price ~ Closure, paired = FALSE, p.adjust.method = "bonferroni", data = winedata)
closures_nonpar

## # A tibble: 6 × 9
##   .y.   group1       group2    n1    n2 statistic        p    p.adj p.adj.signif
## * <chr> <chr>        <chr>  <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
## 1 Price Natural Cork Screw…   758   407    237795 1.13e-52 6.78e-52 ****        
## 2 Price Natural Cork Synth…   758    21     12377 1.38e- 5 8.28e- 5 ****        
## 3 Price Natural Cork Vinol…   758     6      2996 1.8 e- 1 1   e+ 0 ns          
## 4 Price Screwcap     Synth…   407    21      3960 5.7 e- 1 1   e+ 0 ns          
## 5 Price Screwcap     Vinol…   407     6       695 6.9 e- 2 4.16e- 1 ns          
## 6 Price Synthetic C… Vinol…    21     6        36 1.21e- 1 7.26e- 1 ns

H0: Location distributions of the price between group 1 and group2 are the same

H1: They are different

For example, we reject the H0 for Natural cork closure and Screwcap at p-value < 0.001

And we can’t reject H0 for Natural Cork and Vinolok at p-value > 0.05

Conclusion:

The prices of 0.75l bottles of wine are not the same for 4 types of closures (p < 0.001). The effect size is large.

Homework_p1

2024-01-08