library(tidyverse)
theme_set(theme_light())
wine_ratings <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-28/winemag-data-130k-v2.csv")
wine_ratings
## # A tibble: 129,971 x 14
## X1 country description designation points price province region_1 region_2
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 0 Italy Aromas inc… Vulkà Bian… 87 NA Sicily … Etna <NA>
## 2 1 Portug… This is ri… Avidagos 87 15 Douro <NA> <NA>
## 3 2 US Tart and s… <NA> 87 14 Oregon Willame… Willame…
## 4 3 US Pineapple … Reserve La… 87 13 Michigan Lake Mi… <NA>
## 5 4 US Much like … Vintner's … 87 65 Oregon Willame… Willame…
## 6 5 Spain Blackberry… Ars In Vit… 87 15 Norther… Navarra <NA>
## 7 6 Italy Here's a b… Belsito 87 16 Sicily … Vittoria <NA>
## 8 7 France This dry a… <NA> 87 24 Alsace Alsace <NA>
## 9 8 Germany Savory dri… Shine 87 12 Rheinhe… <NA> <NA>
## 10 9 France This has g… Les Natures 87 27 Alsace Alsace <NA>
## # … with 129,961 more rows, and 5 more variables: taster_name <chr>,
## # taster_twitter_handle <chr>, title <chr>, variety <chr>, winery <chr>
This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
The data shows the ratings of different wines, and the variables include the country that the wine is from, and the notes of the wine.
Hint: One graph of your choice.
ggplot(wine_ratings, aes(price, points)) +
geom_point(alpha = .1) +
geom_smooth(method = "lm") +
scale_x_log10()
summary(lm(points ~ log2(price), wine_ratings))
##
## Call:
## lm(formula = points ~ log2(price), data = wine_ratings)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.0559 -1.5136 0.1294 1.7133 9.2408
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.981419 0.035765 2208 <2e-16 ***
## log2(price) 1.974162 0.007338 269 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.408 on 120973 degrees of freedom
## (8996 observations deleted due to missingness)
## Multiple R-squared: 0.3744, Adjusted R-squared: 0.3744
## F-statistic: 7.239e+04 on 1 and 120973 DF, p-value: < 2.2e-16
This graph shows a summary of all of the wine ratings compiled into one.