In my work I would like to make some manipulation with wines data, previously downloaded from Kaggle (source is https://www.kaggle.com/zynicide/wine-reviews) and make some neat graphs (I think they are neat).
30 11 2020
In my work I would like to make some manipulation with wines data, previously downloaded from Kaggle (source is https://www.kaggle.com/zynicide/wine-reviews) and make some neat graphs (I think they are neat).
This time these will be needed:
library(plotly) library(dplyr)
and make a first look at it:
wines <- read.csv(file = './winemag-data-130k-v2.csv') colnames(wines)
## [1] "X" "country" "description" ## [4] "designation" "points" "price" ## [7] "province" "region_1" "region_2" ## [10] "taster_name" "taster_twitter_handle" "title" ## [13] "variety" "winery"
I think, country, points and price are suitable for some visualization.
will look good as a histogram. Here are the top ot them:
for selected countries will be a boxplot:
which make boxplot less informative, I will separate their producers:
## [1] "Chateau Margaux" "Chateau La Mission Haut-Brion" ## [3] "Chateau Haut-Brion" "Chateau Mouton Rothschild" ## [5] "Chateau Petrus" "Chateau les Ormes Sorbet" ## [7] "Emmerich Knoll" "Domaine du Comte Liger-Belair" ## [9] "Chateau Lafite Rothschild" "Chateau Cheval Blanc" ## [11] "Blair"
now without outliers:
Overall median price in this dataset is:
median(wines$price)
## [1] 25
25$? So this is the scale needed for boxplot to be more clear. And what about points?
and this will be boxplot again. The median goes first:
median(wines$points)
## [1] 88