Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
Analyzing Netflix titles in R
April 20th, 2021
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
Dave got his data from the TidyTuesadays website; specifically from Kaggle. It was labeled netflix_titles. Each row represented a different movie/TV show. Listed in each row is the program type (movie/tv show), the title, the duration, the cast, director, release date, and the rating. The variables in this instance lays out all the logistics of each movie and TV show during the period of time he wanted to look at (5yrs, 10yrs etc.)
Hint: For example, importing data, understanding the data, data exploration, etc.
Dave uses a lot of different approaches when sorting through the data. He sorted the whole data set by creating a histogram that sorted by year released. He then changed the binwidth to see movies/tv shows that were released in the last 5 years by setting his binwidth to 5. He also uses the command “fill = type” to separate movies from TV shows, then created two separate histograms using “facet_wrap( ~ type, ncol = 1)”. Overall, Dave was quick and efficient in dealing with all of the data throughout the video. I could go on and on about the different commands he did and how impressed I was by his speed and by how thorough Dave was with this data set. He is able to use commands quickly to sort the data in R exactly the way he wants it to be set up. Dave looked through the console quite often to figure out what the most efficient way to sort the data would be. He used commands like “separate(duration, c(“duration”, “units”), sep = “ “, convert = TRUE) to give duration a numerical quantity on the console.
Dave uses a lot more basic skills that I learned during class than I thought. He was very precise on when he used commands such as ggplot to make a histogram. Dave took the commands we learned in class to a new level. As I described in the question above, Dave used commands to sort the data and separate movies and TV shows on separate histograms. He was able to explore the data set with ease and was able to have data be shown in the exact place he wanted it to be. He seemed extremely in control of the data at all times.
A major finding after analyzing was how exactly to sort and read box plots. I knew how to make them and could read simple ones, but Dave analyzed the box plot pretty thoroughly, helping me improve my overall understanding of how to read and alter box plots. He separated the original boxplot by median to make it easier to read. Some of the older information made the graph harder to read, so sorting by median made the information much easier to interpret.
It’s extremely difficult for me to pick one thing that was interesting to me. If I had to pick one thing that I found the most interesting, it would be the amount of commands he used throughout the entire episode to sort the data in any way he wanted to. He was able to arrange data, select data he wanted to view, and filter by date added. I would never look through data sets as intricately as Dave does. There was a lot of information I wasn’t seeing because of my lack of data exploration throughout the course.