Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ

Instructions

You must follow the instructions below to get credits for this assignment.

Q1 What is the title of the screencast?

The title of this screencast is, “analyzing college major & income data in R”

Q2 When was it published?

It was published on October 15, 2018.

Q3 Describe the data

Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?

The data David Robinson is analyzing came from a csv file titled “recent grads”. There are 173 rows, each row is represented by the highest earning salary by major using 3 variables: median, major, and major catagory. They’re all used to help figur eout the highest paying salary by major. The median varible keeps an average of the salarys from all the majors, making sure that the data is accurate.

Q4-Q5 Describe how Dave approached the analysis each step.

Hint: For example, importing data, understanding the data, data exploration, etc.

Dave began his findings by uncovering the categories that he found most interesting and then visualizing them by creating graphs and charts. He then imported the data into many a variety of visuals including: scatter plots, bar graphs, box plots, line graphs, etc. to then understand the data in as many ways possible. Robinson then summarized the data by using box plots, the most effective way for his scenario in discovering salaries.

Q6 Did you see anything in the video that you learned in class? Describe.

Yes, there were a variety of things I noticed from this video that I also learned in class. For example, when interpreting the salaries and coming up with a median, he used the GGplot function which we have used multiple times in class. What was different was when Dave used GGplot to see a histogram, which was a cool feature I wasn’t aware of. He then compared the major of the salaries of college students by using a box plot and scatter plot, which we have also used in class for quizzes in the past. When generating these graphs, it was also neat to see him utilize the “group by” function which we have also used before with data sets. Overall, I saw many of the same things we do in class, Dave’s speed and articulation are just on another level though.

Q7 What is a major finding from the analysis.

One major finding from the analysis that stuck out to me, has to do with coding itself and not so much the data set. Dave talked about a few different ways to sort the data categories when using GGPLot. One of the more interesting ways he did this was by using the recorder function and then typing (major_catagory, median). This then sorted the data from highest to lowest salaries by major, making the data extremely digestible for anyone reading it.

Q8 What is the most interesting thing you really liked about the analysis.

I really liked how visually expressive Dave was with the data. He displayed the data a variety of ways: graphs, charts, box plots, and scatter plots. All were data visualization techniques we learned in class in a similar way, Dave just had a different way of explaining them that put it in a new light. Robinson delved into the pros and cons of each graph which I think helped me understand what the numbers really mean even more.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.