Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ
You must follow the instructions below to get credits for this assignment.
The title of the screencast that I choose is Tidy Tuesday screencast: analyzing plastic waste across countries.
This screencast was published May 27, 2019.
Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?
Dave uses data from the Global Plastic Waste data set. This data set actually includes three other data sets in orrder to accumate the correct information. The three data sets are coast_vs_waste, mismanaged_vs_gdp, and waste_vs_gdp. The first data set or coast_vs_waste has five columns which include the labels entity or country, code which is a three letter representation of the country’s name, year, mismanaged plastic waste in tonnes, the coastal population of the country, and the total population of the country. The rows are split up by year in order to pull information from a specific year easily. The data set mismanaged_vs_gdp is aplit into the columns entity, code, year, per capita mismanaged plastic waste, GDP per capita, and total population. This data’s rows are also defined by the year of the information. The final data set has columns that are labeled entity, code, year, per capita plastic waste, gdp per capita, and total population. Again, this data set has rows defined by each year that the data is from. One interesting thing that Dave had to do to this data was to clean all of it using the “janitor package”. Dave also used filters in order to make the data easier to read and more compatible. The data sets all had a large amount of “NA” in the columns. These were from the earlier years where we may have not had the technology to find the involved information. To make the data more relevant, Dave used a filter to find years where there were no “NA” in the row. Most of the data was reduced down to 2010 to the present.
Hint: For example, importing data, understanding the data, data exploration, etc.
When Dave was importing the data he had realized that he was working with “uncleaned” data. He used the janitor package to clean the data as he was importing it from the data sets onto his page. Dave also noticed that there were some missing variables from prior years that were represented with an “NA”. Dave used filters to shrink down his data sets to only include data that did not have any NA’s included. Dave also renamed some of the cale names because they were very lengthy. At first he did this seperatley with all of the data sets, but he ended up building a function that did this for him as he imported the data in order to only be presented data that was relevant and the data that he wanted. In order to make his analysis a little bit easier, Dave also joined the data sets into one. After merging the data sets, he found that the names of the variables were very long, so he shrunk them down by renaming them. In order to easily analyze and organize the data, Dave puts the new data set into “view”. This allows him to easily see an organized table showing the data. In this view setting, he can arrange the data in different ways. He can see what country has the largest population and the amount of waste that they have, he can see the mismanaged waste per capita and determine which country has the most waste per person, or an assortment of other arrangments. Dave also uses plots and graphs in order to visualize the data, and see if there’s a visual cue towards a relationship. He uses multiple tools to clean up his graphs and plots to make them easier to analyze and read.
The two things that Dave utilized that I remember doing and learning in class was the creation of graphs and plots using the ggplot() and geom_point() and the cleaning of data using multiple tools. The functions ggplot() and geom_point() are very familiar, as we used them many times in order to analyze data sets we encountered. I had problems cleaning data in class but Dave used the janitor package and seemed to make the process pretty simple. I don’t remember if we used the janitor package at all in our quizes, but it seems like a helpful tool when cleaning data!
The major finding from this analysis is that there is a large correlational relationship between GDP per capita and CO2 emissions per capita, but there is little correlation between GDP per capita and mismanaged plastic waste per capita. In other words, countries who make more “income” tend to have higher CO2 emissions compared to those that do not make a lot of money. The fact that there was little correlation between plastic waste and GDP kind of surprised me. I would think that the more money available to a country would result in more waste. However, it could be that these further devoloped countries are working against plastic pollution. This was a very interesting analysis.
I enjoyed watching Dave seamlessly utilize the data. I personally have a hard time understanding how to import the correct data and especially how to manipulate it to my likings. Watching Dave manipulate, merge, and use the multiple data sets easily was fun to watch. I enjoy the analysis, but struggle with the importation and manipulation. I also enjoyed the relevance of the data. Pollution is one thing that I always find very interesting and learning more about it is always exciting!