Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
Tidy Tuesday screencast: predicting wine ratings
It was published on May 31st, 2019.
The source of the data came from a link called GitHub. There are 130,000 observations, and the count of each unique word token for each id is the y axis and the variables are the title, variety, and winery. The title is he title of the wine review, which often contains the vintage (year), variety is the grape type used to make the wine, and winery is the winery that made the wine. He added a year column in the first R chunk, country/region too, designation, taster name, variety, winery.
Dave approached importing the data by inserting the GitHub link whic has the wine ratings data. He reviewed the data and slimed it down to Country/Region of the wine, designation of the wine, taster name, the variety meaning which grapes were used, and what winery it camw from. He created bar graphs starting with the year and count and added more graphs using the different variables. He also did multiple linear models and created a ggplot at the end.
In the video, Dave added color and labels to the graph along with a title. We also learned how to create a ggplot and he created one at the end.
At the end, Dave made a lasso regression based on 4 random wines and made the dependent variable on words whether positive or negative towards the wine. He found most wines are more positive words than negative and each psotive word has more effect towards the wine.
The most interesting thing I really liked about the analysis is there’s data on different wines based on positive or negaitve wording towards the wine. He did a data set on 4 wines people tasted and based on their words described how the wine was whether good or bad.