Tidytuesday Screencast

Instructions
Q1 What is the title of the screencast?
Q2 When was it published?
Q3 Describe the data
Q4-Q5 Describe how Dave approached the analysis each step.
Q6 Did you see anything in the video that you learned in class? Discuss in a short paragraph.
Q7 What is a major finding from the analysis.
Q8 What is the most interesting thing you really liked about the analysis.
Q9 Display the title and your name correctly at the top of the webpage.
Q10 Use the correct slug.

Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ

Instructions

You must follow the instructions below to get credits for this assignment.

Elaborate your answer. One or two sentence answers won’t get credit.
Make sure to cite what you see and hear in the screencast.

Q1 What is the title of the screencast?

Analyzing Netflix titles in R

Q2 When was it published?

April 20th, 2021

Q3 Describe the data

Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?

Dave got his data from the TidyTuesadays website; specifically from Kaggle. It was labeled netflix_titles. Each row represented a different movie/TV show. Listed in each row is the program type (movie/tv show), the title, the duration, the cast, director, release date, and the rating. The variables in this instance lays out all the logistics of each movie and TV show during the period of time he wanted to look at (5yrs, 10yrs etc.)

Q4-Q5 Describe how Dave approached the analysis each step.

Hint: For example, importing data, understanding the data, data exploration, etc.

Dave uses a lot of different approaches when sorting through the data. He sorted the whole data set by creating a histogram that sorted by year released. He then changed the binwidth to see movies/tv shows that were released in the last 5 years by setting his binwidth to 5. He also uses the command “fill = type” to separate movies from TV shows, then created two separate histograms using “facet_wrap( ~ type, ncol = 1)”. Overall, Dave was quick and efficient in dealing with all of the data throughout the video. I could go on and on about the different commands he did and how impressed I was by his speed and by how thorough Dave was with this data set. He is able to use commands quickly to sort the data in R exactly the way he wants it to be set up. Dave looked through the console quite often to figure out what the most efficient way to sort the data would be. He used commands like “separate(duration, c(“duration”, “units”), sep = “ “, convert = TRUE) to give duration a numerical quantity on the console.

Q6 Did you see anything in the video that you learned in class? Discuss in a short paragraph.

Dave uses a lot more basic skills that I learned during class than I thought. He was very precise on when he used commands such as ggplot to make a histogram. Dave took the commands we learned in class to a new level. As I described in the question above, Dave used commands to sort the data and separate movies and TV shows on separate histograms. He was able to explore the data set with ease and was able to have data be shown in the exact place he wanted it to be. He seemed extremely in control of the data at all times.

Q7 What is a major finding from the analysis.

A major finding after analyzing was how exactly to sort and read box plots. I knew how to make them and could read simple ones, but Dave analyzed the box plot pretty thoroughly, helping me improve my overall understanding of how to read and alter box plots. He separated the original boxplot by median to make it easier to read. Some of the older information made the graph harder to read, so sorting by median made the information much easier to interpret.

Q8 What is the most interesting thing you really liked about the analysis.

It’s extremely difficult for me to pick one thing that was interesting to me. If I had to pick one thing that I found the most interesting, it would be the amount of commands he used throughout the entire episode to sort the data in any way he wanted to. He was able to arrange data, select data he wanted to view, and filter by date added. I would never look through data sets as intricately as Dave does. There was a lot of information I wasn’t seeing because of my lack of data exploration throughout the course.

Tidytuesday Screencast

how a data scientist approaches data analysis

Blake Greene