Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ

Instructions

You must follow the instructions below to get credits for this assignment.

Q1 What is the title of the screencast?

The title of the screencast is, “Riddler: Simulating Rolling a non-increasing sequence”.

Q2 When was it published?

This video was published on April 6, 2020.

Q3 Describe the data

Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?

The source of data simulation included a 10-sided die with labels 0-9, and a sheet of paper to record your “scores”. To begin the game you roll the die and the number shown is your current “score” divided by 10. You keep rolling and If the digit shown by the die is less than or equal to the last digit, then that becomes the new last digit of your “score”. Keep rolling until a 0 is rolled and the game ends. Dave wants to find what the average final score in the game is. The row represents the number of times you roll, with the digits 0-9 that was scored,group, and previous_min. He sampled 1 million roles by the end of the video. The variables are that the digit always has to be equal to or less than the cumulative min, always decreasing.

Q4-Q5 Describe how Dave approached the analysis each step.

Hint: For example, importing data, understanding the data, data exploration, etc.

Dave approached the analysis by knowing to use simulation in R to find out many scores. He uses an efficient approach using the cumulative functions from dplyr. He starts by using a tibble to do some rolls, he samples and starts with 1,000 rolls with digits 0-9 replace=true. He wants to do all of the rolls up front and then divide them up into individual groups/sequences based on whether there is a 0. To seperate the three groups formed by the data he uses group=cumulativesum of rolls=0. Cumsum is a built in function that is helpful for an approach like this. Dave adds a lag from dplyr so that the data is “one later”(all in sequence 0,1,2,3). 1,000 rolls ended up dividing into 107 groups. He did not waste any rolls and generated 1,000 rolls all at once and then split into all the times the game was played. Now he needs to say that within each group, he wants to remove the ones that are greater than any previous value (He uses group by group function and a group filter). He puts previous max= lag(cummax (roll), default=Inf. He then realizes that he needs to filter rolls less than or equal to previous minimum, not maximum. He also takes out the lag saying he does not need one. The data then showed many 1’s, so he removes the filter to see if that is accurate, knowing it makes sense. He then decides to change 1,000 rolls to sample 1 million rolls changing to 100,000 groups of data. He needs to now get the scores, putting down that decimal equals the row number within groups that are negative. Multiplies rolls by 1/10, 1/100, 1/1000 and so on. The decimal is the negative row number and 10 to the power of the negative real number. Now he needs to summarize the data so the score is the sum of the decimal. Lastly he samples 10 million rolls, which is about a million simulated scores. Now that he has a distribution of scores he can now answer the question to the puzzle. Scores summarize mean score, and .474 is the average score in this game of decreasing decimals. This result was higher than he expected and thought it would be more towards 0. From the data, he adds the scores in ggplot,histogram. Notices a fractal pattern in the histogram, you cannot go up once you have gone within a decimal point.

Q6 Did you see anything in the video that you learned in class? Describe.

When Dave uses ggplot for the scores to create histogram, we learned in class about ggplot. He aslo uses tidyverse to filter, group by, and summarize which was learned and discussed in class.

Q7 What is a major finding from the analysis.

A major finding from the analysis is that .474 is the average score in this game of decreasing decimals, sampling 10 million rolls. To find the answer to the riddle he knew that using cumulative window functions was beneficial. You can include simple recursive relaitonships in terms of paying attention to what has already been rolled with cumcum and cumin functions. The group by group function was helpful in terms of saying the game ends when a 0 is rolled.

Q8 What is the most interesting thing you really liked about the analysis.

The most interesting thing I liked from the analysis was that the average score was higher than expected, like Dave said, I figured with the data always decreasing by the prevous value that the average would be more towards 0. Dave creating the histogram to show the data clearer was interesting. He pointed out that it was almost like a fractal pattern, within the shapes you cannot go up.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.