Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
Tidy Tuesday live screencast: Analyzing the Tour De France data in R.
The objective in this video is to pull out data anylitics on the different types of bikers to try to figure out why they ranked the way that they did throughout the many years that the event has been hosted.
The video was published/preformed live April 7th, 2020.
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
The source of the data comes from a Tour de France package that Dave Robinson installed at the beginning of the video. Throughout the hour long video he makes tons of observations on all the winners of the Tour de France throughout history. Whats really cool about this specific video that he said he has never done before is that he is making different observations based off of the people in the live chat and he is doing all this research based off of what people want to see. Different variables he uses to disect each winner is he breaks them down into multiple different groups to find patterens, the first thing that he did was find the age, weight, and height of all the winners to try to determine any sort of correlation. Then he goes on to analyze data about different countries that have won to see which country is the most dominate, United States ranked 4th because of Lance Armstrong. After that he started to get really creative and wanted to break down the races even more and noticed that the average speed throughout the years has really started to pick up and he was seeing major advancments since the tradition started over 100 years ago. He also did a cool part on the winning margin of the leaders, way back when the Tour de France first strated the winning margin by minutes was huge and then it has severely shrunk over the years. He also anaylyzed a lot more stuff that was really interesting such as life expextancy of winner, course size, and even the orgin of countries winning over time. The goal of all this is just to find patterens in the data to see commanalities to determine trends and say what worked in the past and what is working now.
Hint: For example, importing data, understanding the data, data exploration, etc.
What Dave did in this video that I thought was really cool was that he was analyzing things based on what his audience wanted to see. So people in the comments were sudgesting all sorts of things to look at and what he did was import the code to see that specific set of data and analyize it. So what he would do for this specific data set was he would read the quesiton on the side and then import the code and put it into a line chart graph so that he can see the changes over the years. After he got the chart pulled up he would expand the chart to make it a bigger size and then he would talk about correlation and why things played out this specific way. One cool example that I saw in the video was he wanted to disect the margin of victory over the years, so he plugged all the code in and he talked and pulled out a graph from the data. In the graph it was very easy to talk see that when the race first started the intial winners would take victories by a long span of time and that number has drastically decreased over the years until now were the margin of victory is only within a few minutes. He used this and many other data sets to determine the conclusion that while the bikes advanced so did the whole speed of the race increased and has put more people in the postion to win right at the very end.
The big thing that he kept coming back to and creating to show all of his data were the line charts. I rember making line charts in class to represent data sets over time. Now we just made one big one for the housing market however david made tons of them to analyze how different winners preformed over the years in the Tour de France helping him draw to his major conclusions. One example that he did was look at the average speed of the racers and back 100 years ago the line was really low and then over about a 10 year gap it skyrocketed to a much faster pace.
One really cool conclusion that he came to over the course of the video is that its not the people that are advancing, it is the equiptment. What I mean by this is that over the years he noticed that the race times were getting significantly faster and the winning margin of the winners were also shrinking significantly. He tried many different variable to try to determine why this would be that case, the first thing he was thinking was that maybe the athletes are getting better, however since he new all the winners attributes there were no distinct patterens in the between the different racers. But the end conclusion were that the bikes were getting better and could be made to move faster and more efficently allowing for the average speed to be significantly increased.
I am a really big sports guy so I love looking at different data sets between athletes to try to determine what makes an athlete dominate and what makes an athlete just alright. So I thought that this was just a really interesting video to try to determine how the speed of the bikers were getting better and what are the physical attributes of an athletes that won. I also really liked the conclusion that I just talked about in the paragraph above when I mentioned that the bikes were advancing which is why athletes were advancing. I think this is such an interesting thought process, we hear all the time today and we talk so much about the all time greats in sport and who is the best of all time but I think technology advancments and our knowledge of medical science plays such a huge role in that. Athletes are better today because we as a society know much more about what athletes can do to become more dominate than they were 30-40 years ago.