Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
The title of the Screencast is ‘“The Riddler” Screencast: Simulating A Week Of Rain’, by David Robinson.
This screencast was published to Youtube on December 12, 2018.
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
The reason behind Dave’s statistical analysis, and thus the reason for this entire project, is to answer a question that comes from ‘The Riddler Express’ column on fivethirtyeight.com. The question was about a man’s commute to work, and the likeliness of him getting wet from the rain during his commute.
The source of the data would be library(tidyverse), with the variables being simulated in crossing(trial). Crossing(trial) established the rows of data, 500 in total, that represent 100 simulations of weather patterns between Monday and Friday. The values of ‘home’, ‘office’, ‘start’, etc. were established simply by entering them in to the R code chunk.
Additional variables were later entered in the video to simulate a 50% probability of it raining during the morning and a 40% probability of it raining during the evening.
The overall meaning of the values and variables was to determine whether or not it was likely to rain in the mornings or evenings between Monday and Friday and when it would begin if it rained at all on a specific date.
Hint: For example, importing data, understanding the data, data exploration, etc.
Dave began his solution to the problem by using a package in library(tidyverse). He then continued to enter the variables and values by using crossing(trial) with a mutate function and set the days to Monday to Friday by entering weekday variables to run 100 simulations of weather patterns Monday to Friday. The values were simply added in to the R code chunk to establish the date, time, and location that the man would be in when it was or was not supposed to rain.
Dave backtracked frequently during the video by re-writing the majority of his code in order to establish the most efficient and easy-to-interpret formula to answer the question.
His overall approach to each step (by the end of the video) seemed to be to constantly check over what information and codes he had entered and change them to better isolate each variable in order to find what he needed. It is easy to tell Dave has a greater understanding of how to enter data then the class because he entered each formula at least 2 to 3 different ways before he found a system that he thought work be the easiest for the viewers to understand and interpret.
Finally, at the end of the video, Dave created a graph depicting the most frequent days and tmes of the week the man would get wet from the rain.
The beginning of the video was familiar because Dave used library(tidyverse). I knew that this is where he was getting his data from, just like we have in other assignments. I was also able to recognise his mutate functions and the group_by functions because we have had to use the same ones.
There were some functions Dave used in the analysis that I did not recognise as well, such as the set.seed function he used to determine the year in which this happened. The crossing(trial) was also new to me but I was able to determine why it was used based on the information he entered. It seems to be used to determine the values and how they are tested.
The major finding from the analysis was that Friday mornings, followed by Thursday mornings were the most likely times that the man would get wet from the rain. He would almost always be dry on Mondays, regardless of what time it was. It also seemed he was less likely to get wet in the evenings because there was less of a chance of it raining in the evenings.
The most interesting thing I really liked from the analysis was that there seem to be so many different ways to get the same information. During the video, Dave erased his work and re-wrote it for nearly every step of his process, and it goes to show that Rstudio is very non-linear in the fact that a coder does not have to stick to one specific function to solve a certain problem.
It was also interesting to see that he had used some of the same techniques that we had learned in class. I was expecting for him to use some advanced permutations that I would not be able to follow, but instead he stuck to the basics that we learned during the course of the semester and it really shows just how useful those can be when it comes to coding and interpreting data in RStudio.