CO2 Emissions Final Project

Introduction

What does the graph represent?

This visualization, from a New York Time article, illustrates the evolution of global CO2 emissions from 1850 to 2014 with the intention of comparing the pollution produced by developed countries and still developing ones and the proportion made by the actual superpowers.

By taking a closer look at it we can notice some trends among the regions, such as low CO2 emissions till 1950, shortly after the end of WW2 during the economic boom, when it suddenly started growing exponentially and never stopped after. This is known as the Great Acceleration.

We can also observe an anomaly in Russia around 1990, with a significant decline in the curve, presumably due to the collapse of the USSR. Additionally, there is an evident dominance of the United States that will be overtaken by China starting in 2000, when the country experienced an immense increase in pollution.

Lastly, when we compare the two main groups, we can observe that developed countries have seen a slight decrease in pollution, most likely due to ecological awareness of their population and the new laws and regulations put in place to preserve the planet. Meanwhile, third world countries have significantly increased their emission production.

Source: NYT Climate Change Graphs

Why did I choose this graph?

I picked this graph because it is a topic that affects all of us on an equal basis, and it is important to raise awareness about it. Besides, the chart is visually appealing, and it immediately caught my attention because of its colours and memorable design. It does a great job at explaining a complex problem such as this one using a simple image and highlighting the crucial fact that contamination is a current problem everywhere.

Whoever, it is still imperfect, since it is lacking clarity and needed a few improvements for a better understanding. I figured this would be an amazing challenge for me and a good way of strengthening my coding skills, considering that we haven’t worked with a lot of stacked area charts and that the plot holds an important amount of data.

Critical evaluation of the graph

To determinate whether the visualization is effective (well done) or not, I have based my judgment on the criteria seen during this course.

Starting off with accuracy which I would consider to be the major flaw of this graph, given that it is extremely difficult to make a correct estimation of the values. This is due to several factors such as an absence of guidelines to indicate a more precise number, the stacked areas also make it harder to approximate especially when the ones at the top, and most importantly the y axis is untrue. In this type of charts the areas are piled up, therefore the values should be adding up to a total of 90 billion, but here we see that it only sums up to 30. For instance, when we look at China in 2014 it shows that it produced between 20 and 25 billion metric tons of CO2, when its reality was over 26.

Moving on the discriminability, it was nicely done, the white edges help us define when there is a switch of region and the contrast of colour was well chosen, since there is a notable contrast between the two. However, by making it into a stacked area chart, it is quite challenging to compare the categories with each other, particularly when they are far apart.

Continuing with the separability, the three type of channels (colour, intensity of the colour and x-y position) are working perfectly together and do not alter the intended message.

When it comes to the pop-outs, in this case there’s only two so it can draw the attention of the viewer to the two most impactful countries by darkening their areas.

Finally, coming to the greatest characteristic of the graph, the grouping. The programmer did an incredible work at organizing the categories, since they got to show the two main groups (developed and developing economies) and the principal country within each of them that contributed to the total. It is intuitive and the legend adds extra support for an easier comprehension.

Overall, it is an acceptable graph that decently displays the information, but that still has place for improvement.

Replication

Before we get started please make sure you have all of the following packages installed by running these commands:

if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")

Data Process

Since this graph was particularly rich in data, it was impossible to define the true value for each region without the use of a pre-existent data set. Even so, the most accurate one I was able to find didn’t entirely match the chart; therefore, a major part of my work was to modify the data set to keep only the wanted information.

library(ggplot2)
data <- read.csv("nation.1751_2017.csv")

Once I had deeply analysed it, I made some small alterations like renaming the columns to avoid any confusion and mentally got rid of all the unnecessary ones. I also deleted all the data that hadn’t happened during the period of 1850-2014.

#renaming the columns
colnames(data)[2] <- "Year"
colnames(data)[1] <- "Nation"
colnames(data)[3] <- "Total"

#creating vectors with all the values from 1850 to 2014
Year <- NULL
Nation <- NULL
Total <- NULL
y <- 5
while (y <= nrow(data)) {
  if(data$Year[y] >= 1850 & data$Year[y] <= 2014){
    Year[y] <- data$Year[y]
    Nation[y] <- data$Nation[y]
    Total[y] <- data$Total[y]
  }
  y <- y + 1
}
#creating a data frame with all of the filtered data
data2 <- data.frame(
  Nation = Nation,
  Year = Year,
  Emissions = Total)

#deleting every empty row
data2 <- data2[!apply(is.na(data2), 1, all), ]

The first essential thing to do was creating the regions. In order to do such a thing, I generated a data frame containing only the names of each country (island and historical territory included) so I could get better vision of what I had to work with. Although optional, this was a very accommodating step to enable the creation of the two regions holding more than one country. With the help of the 2014 wesp country classification pdf I made a separate vector for developed countries and European Union. This is when I encountered my first difficulty, many countries had suffered historical changes, so when we were going from a year to another the name would completely change and wouldn’t get automatically associate with the right group. By comparing each name to the table and being more attentive to the typo I was able to get every nation correctly placed. I then ordered the regions to completely correspond to the original.

#creating a new data frame with the names of every country so we can have a clearer vision of the data
data1 <- data.frame(
  Country = unique(data$Nation))

#region groups according to 2014wesp_country_classification.pdf
EU_countries <- c( "AUSTRIA ", "BELGIUM ", "BULGARIA ", "CROATIA ", "CYPRUS ", "CZECH REPUBLIC ","DENMARK ", "ESTONIA ", "FINLAND ", "FRANCE (INCLUDING MONACO) ", "GERMANY ", "GREECE ", "HUNGARY ","IRELAND ", "ITALY (INCLUDING SAN MARINO) ", "LATVIA ", "LITHUANIA ", "LUXEMBOURG ", "MALTA ","ST. PIERRE & MIQUELON ","NETHERLANDS ", "POLAND ", "PORTUGAL ", "ROMANIA ", "SLOVAKIA ", "SLOVENIA ","SPAIN ", "SWEDEN ", "UNITED KINGDOM ", "SAINT MARTIN (DUTCH PORTION) ","FORMER GERMAN DEMOCRATIC REPUBLIC " )

other_developed <- c("ICELAND ", "NORWAY ", "SWITZERLAND ", "AUSTRALIA ", "CANADA ", "JAPAN ","NEW ZEALAND ", "JAPAN (EXCLUDING THE RUYUKU ISLANDS) ")

#creating the region column
i <- 1
data2$Region <- character(nrow(data2))
while(i <= nrow(data2)) {
  if (data2$Nation[i] == "UNITED STATES OF AMERICA ") {
    data2$Region[i] <- "United States"
  } else if (data2$Nation[i] == "CHINA (MAINLAND) " | 
             data2$Nation[i] == "MACAU SPECIAL ADMINSTRATIVE REGION OF CHINA ") {
    data2$Region[i] <- "China"
  } else if (data2$Nation[i] == "INDIA ") {
    data2$Region[i] <- "India"
  } else if (data2$Nation[i] == "RUSSIAN FEDERATION " |
             data2$Nation[i] == "USSR ") {
    data2$Region[i] <- "Russia"
  } else if (is.element(data2$Nation[i], EU_countries)) {
    data2$Region[i] <- "European Union"
  } else if (is.element(data2$Nation[i], other_developed)) {
    data2$Region[i] <- "Other developed"
  } else{
    data2$Region[i] <- "Rest of world"
  }
  i <- i + 1
}

#order rows as in the original
data2$Region <- factor(
  data2$Region,
  levels = c(
    "Rest of world",
    "India",
    "China",
    "Russia",
    "Other developed",
    "European Union",
    "United States"
  )
)

The following step was creating the two categories that will form the legend; despite not being required, I decided to link it to their actual regions, but this procedure could be skipped, since it won’t be observable on the areas.

#legend categories
i <- 1
Legend <- NULL
while (i <= nrow(data2)) {
  if(data2$Region[i] == "Other developed" |data2$Region[i] == "European Union"
     | data2$Region[i] == "United States"){
    Legend[i] <- "Developed economies"
  }else{
    Legend[i] <- "Other countries"
  }
  i <- i + 1
}
data2$Legend <- Legend

Another problem I had to face was that when I was plotting the data the axis text was overlapping, and no data would appear because the program was considering the years and emissions as characters. The solution was as simple as converting those into numerical data.

# Convert the data into numeric values so it can be correctly plotted
data2$Year <- as.numeric(data2$Year)
data2$Emissions <- as.numeric(data2$Emissions)

Furthermore, I noticed that the areas didn’t have the same shape as the ones from the original since the program was plotting it by nations instead of the assigned regions, so I summed it up by region and divided by 100,000 to fit the scale.

#summing up the values by region
library(dplyr)
data3 <- summarise(group_by(data2, Region, Year, Legend),
                   EmissionsByRegion = sum(Emissions))

#new scale matching the original 
data3 <- mutate(data3, EmissionsNewScale = EmissionsByRegion/100000)

Plot Process

Now that I had completely filtered, cleaned out and organized the data set I was ready to initiate the process of plotting the graph. The most evident thing to commence with was to indicate the axis role (the years horizontally and the emissions vertically) as well as the main data representation, in this case the regions.

#plotting the original graph
p <- ggplot(data3) +
  #the areas
  aes(x = Year, y = EmissionsNewScale, fill = Region)+
  geom_area(colour = "white",
            size = 0.2,
            show.legend = F)
p

Then, I programmed the geoms for which I used the function geom_area and hid the legend that was occupying too much space and didn’t bring any useful information. However, the predefined colours were clearly nothing like the ones I intended to have, so I simply encoded all of them manually, using the website colorcodes.com to get the exact shade match. In the same function I also established the colours of “Developed economies” and “Other countries” and added breaks only on those two groups, otherwise every other category would also appear in the legend.

  #changing the colors to match the original graph by looking for their code  in https://htmlcolorcodes.com/
p <- p + scale_fill_manual(
    values = c(
      "China" = "#F28124",
      "Russia" = "#F5A15B",
      "India" = "#F5A15B",
      "Rest of world" = "#F5A15B",
      "Other developed" = "#86BBCC",
      "European Union" = "#86BBCC",
      "United States" = "#529CBA",
      "Developed economies" = "#86BBCC",
      "Other countries"     = "#F5A15B"
    ),
    #Only those two legend entries will appear with the associated color
    breaks = c("Developed economies", "Other countries")
  )
p

Furthermore, I modified the axis so they would have the same scale and number of breaks as in the original, but as I had previously mentioned the count was entirely incorrect and the only way to replicate it was by falsifying it. I encrypted the same total of breaks as in the original with their relative position, and I proceeded to attribute them the number that was corresponding them from the initial graph.

#changing the axis to match the original
p <- p +
  scale_x_continuous(breaks = c(1850, 1900, 1950, 2000),
                     expand = c(0, 0),
                     limits = c(1840, 2014)) +
  scale_y_continuous(breaks = seq(0,100, by=15),
                     labels = c(" ", "5", "10", "15", "20", "25", "30 billion\nmetric tons"),
                     position = "right",
                     expand = c(0,0),
                     limits = c(0, 98))+
  theme(
    axis.ticks = element_line(colour = "grey"),
    axis.text = element_text(colour = "grey", size = 19),
    axis.text.y = element_text(hjust= 0))
p

This brings us to the next step and, in my opinion, one of the biggest challenges I had to deal with while making this project: plotting the legend. As we know, ggplot2 is designed to automatically generate a legend resulting directly from the geom’s data, however when the legend doesn’t match the predefined categories from the visualization, things get much more complicated. The technique I decided to use was creating an invisible geom by equaling alpha to 0, so despite of being present in the graph it wouldn’t be perceivable.

#legend
#making invisible areas so the categories can appear in the legend
p <- p +
  geom_area(aes(fill = Legend), alpha = 0) +
  #placement and aesthetic of the legend
  guides(fill = guide_legend(override.aes = list(alpha = 1))) +
  theme(
    legend.title = element_blank(),
    legend.position = c(0.15, 0.15),
    legend.text = element_text(size = 17),
    legend.key.size = unit(2, "lines")
  )
p

Another complication I had was encoding the title and name of the countries. First, I would say that adding the title was tricky only because it wasn’t intuitive. Instead of applying the typical function you use for titles which would leave it outside of the panel, I had to mark it as an annotation, so it would stay integrated inside of the graph. Second, the region’s names, even though there is no real complexity in the code, it was still quite demanding, since there wasn’t any data attached to them, and I had to approximate their position and play with it till getting the closest possible location.

#Title & Subtitle
p <- p +
  annotate("text", label = "CO2 emitted worldwide", fontface = "bold", x = 1945, y = 75, size = 8.5, hjust = 0)+
  annotate("text",label = "Between 1850-2014", size= 6, x = 1945, y = 72, hjust =0)+
  #country names
  annotate("text", label= "Rest of\nworld", size= 6, x= 1995, y=52)+
  annotate("text", label= "India", size= 6, x= 1998, y=46)+
  annotate("text", label= "China", size= 6, x= 2006, y=43)+
  annotate("text", label= "Russia", size= 6, x= 1982, y=31)+
  annotate("text", label= "Other\ndeveloped", size= 6, x= 2002, y=28.5)+
  annotate("text", label= "European Union", size= 6, x= 1995, y=19)+
  annotate("text", label= "United States", size= 6, x= 2002, y=7.7)

Lastly, I had to to match the aesthetic to the original with the function theme. My base was theme_minimal and from there I removed the remaining grids and needless titles and made all the necessary changes in the colours, size, spacing, etc.

#overall aesthetic
p <- p + theme_minimal() +
  theme(
    axis.title = element_blank(),
    panel.grid = element_blank(),
  )

This is the result you should obtain:

Improvement

Although the first chart was already satisfactory and convey the message appropriately, the interlocutor may still struggle to clearly identify the information and easily compare it.

I started my amelioration process by cutting off all the data preceding 1900. In those 50 years (1850-1900), the evolution was minimal and for all the space that it was occupying, it wasn’t providing any essential information for the comprehension of the graph. By removing this data, we are now focusing on the most significant development but still have enough context to understand that the situation is far more critic in the present than it was back then. On top of that, I expanded the chart data till 2017 because it was a shame not to include the most recent information giving by the data set. This particularly contributed to increasing the accuracy which was an important issue that needed to be solved.

#IMPROVED VERSION

#creating a new data set 
#values from 1900 to the most recent one (in this case: 2017)
Year1 <- NULL
Nation1 <- NULL
Total1 <- NULL

y <- 5
while (y <= nrow(data)) {
  if(data$Year[y] >= 1900 & data$Year[y] <= 2017){
    Year1[y] <- data$Year[y]
    Nation1[y] <- data$Nation[y]
    Total1[y] <- data$Total[y]
  }
  y <- y + 1
}

data4 <- data.frame(
  Nation = Nation1,
  Year = Year1,
  Emissions = Total1
)

#deleting all of the empty rows 
data4 <- data4[!apply(is.na(data4), 1, all), ]

The most crucial change I had to make was about the axis, despite of being a very quick in easy fix, as I mentioned before, the original scale was misleading giving the impression that every area summed up was equal to 30 billion. All I had to do was keep the labels from the breaks. For more precision, I also added more values on the x-axis with breaks every 25 years to avoid overwhelming the data.
(This step will be shown in the final code)

The chart’s primary objective was to compare the regions, but the previous version made it impossible, so many changes were needed to achieve that goal. To begin with, I separated every region by facets, joined India with the rest of the developing countries and changed Russia to contain the former USSR countries (excluding the ones that integrated the EU) for a fairer comparison. This grouping is based on the fact that every developing economy has its equivalent on top, for instance, China has the corresponding superpower, the United States. Regarding the USSR (a former country alliance) there is the European Union and concerning the rest of the world it could be comparable to all the remaining developed economies.

#Former USSR countries (excluding the ones that integrated the EU or became a developed economy)
USSR_countries <- c( "ARMENIA ", "AZERBAIJAN ", "BELARUS ", "GEORGIA ","KAZAKHSTAN ", "KYRGYZSTAN ", "LITHUANIA ", "REPUBLIC OF MOLDOVA ","RUSSIAN FEDERATION ", "TAJIKISTAN","TURKMENISTAN ", "UKRAINE ", "UZBEKISTAN ",  "USSR " )

#New classification of the data with India being a part of the rest of the world
i <- 1
Region <- character(nrow(data4))
while(i <= nrow(data4)) {
  if (data4$Nation[i] == "UNITED STATES OF AMERICA ") {
    Region[i] <- "United States"
  } else if (data4$Nation[i] == "CHINA (MAINLAND) ") {
    Region[i] <- "China"
  } else if (is.element(data4$Nation[i], USSR_countries)) {
    Region[i] <- "Former USSR countries"
  } else if (is.element(data4$Nation[i], EU_countries)) {
    Region[i] <- "European Union"
  } else if (is.element(data4$Nation[i], other_developed)) {
    Region[i] <- "Other developed"
  } else{
    Region[i] <- "Rest of world"
  }
  i <- i + 1
}

data4$Region <- Region

#Organizing the regions the way I want it to be in the graph
data4$Region <-factor(
  data4$Region,
  levels = c("United States",
             "European Union",
             "Other developed",
             "China",
             "Former USSR countries",
             "Rest of world")
)

On top of that, I sligthly modified the colours so the intensity would increase as the impact on the environment, of each country, would go up.

To enhance the accuracy even more, I have added labels that display the maximum value of every region and an arrow that points towards the highest point in the area. It was obviously impossible to incorporate the number of emissions per year for each region due to the large amount of data, as the numbers would have overlapped. Nevertheless, only with the maximum value can we get a better idea of the impact each territory globally had and, most importantly, when it happened, which indicates if it has improved or not.

#Transforming the the years and emissions in numerical data instead of characters so we are able to plot it
data4$Year <- as.numeric(data4$Year)
data4$Emissions <- as.numeric(data4$Emissions) 
#Adding a new column with the sum of emissions for each region and changing to scale to billions
data4 <- summarise(group_by(data4, Region, Year),
                   EmissionsByRegion = sum(Emissions)/100000)

## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.

#Calculating the what the maximum value is and when it happened
MaxValue <- slice_max(group_by(data4, Region),
                      order_by= EmissionsByRegion, n= 1)

#rounding the values so the annotations won't have large numbers
MaxValue$EmissionsByRegion <- round(MaxValue$EmissionsByRegion, 2)

Then we get to do the legend which very similar to the previous one.

#Categories for the legend
i <- 1
Economy <- NULL
while (i <= nrow(data4)) {
  if(data4$Region[i] == "Other developed" |data4$Region[i] == "European Union"| data4$Region[i] == "United States"){
    Economy[i] <- "Developed economies"
  }else{
    Economy[i] <- "Developing economies"
  }
  i <- i + 1
}

data4$Economy <- Economy

To simplify comparison, I added four last elements to provide context. The first one is a grey background that represents global CO2 emissions, and I labelled it that way in the chart to avoid any possible confusion. This was an easy addition to make, as I only needed to create a new data frame with the sum of each region and then plot it just like any other area.

#Creating a background so we can compare the regions to the total
background <- summarise(group_by(data4, Year),
                        EmissionsEvolution = sum(EmissionsByRegion))

I also included a dashed line plotted with geom_line of the regional average that allows us to immediately see the territory’s impact at a given point and time. The most complex part was introducing a legend linked to the line that would describe its function. Since ggplot only shows the legend for data mapped inside of aes, I had to add a column in the data frame used for the mean with the name that I wanted to be shown in the legend and plot it in the aes as linetype = Name. With the function scale_linetype_manual I was then able to attribut it the dashed type.

#Average of the regions for each year
Average <- summarise(group_by(data4, Year), average= mean(EmissionsByRegion))
Average$Name <- rep("Regional Average", length(Average$Year))

Another small addition I made is a rank from the most to the least contaminating region in 2017 and plotted the ranking position on the side of each facet to make it more intuitive for the viewers.

#Ranking for 2017
Ranking <- filter(data4, Year== 2017)
Ranking <- arrange(Ranking, desc(EmissionsByRegion))
Ranking$Rank <- 1:6

The last added element was a variation rate for each region since the begging of the century (2000-2017) that gives more of a relative context and helps seeing how the situation has progressed.

#Variation since 2000
Variation <- filter(data4,Year == 2000| Year == 2017 )
Variation <- summarise(Variation, 
                       Emissions_2000 = EmissionsByRegion[Year == 2000],
                       Emissions_2017 = EmissionsByRegion[Year == 2017])

Variation <- mutate(Variation,
                    VariationPercentage = (Emissions_2017 - Emissions_2000)/Emissions_2000*100)

i <- 1
sign <- NULL
while (i <= nrow(Variation)) {
  if(Variation$VariationPercentage[i] > 0){
    sign[i] <- "↑"
  }else{
    sign[i] <- "↓"
  }
  i <- i + 1
}

Variation$VariationPercentage <- paste(sign, round(Variation$VariationPercentage, 2), "%")

Ultimately, I used the function theme and coded the spacing, colours, axis lines, titles, and other aesthetic elements to make the final adjustment to the graph.

Now here’s what the code for the plot should look like

#When you run this command please remove "q <-" and expand the image to cover the whole screen to get the real result
#New graph
q <- ggplot() +
  #background
  geom_area(
    data = background,
    aes(x = Year, y = EmissionsEvolution),
    fill = "grey",
    alpha = 0.3,
    color = "darkgrey",
    size = 0.2
  )+
  #Labeling the background
  annotate(
    "text",
    label = "World\nEmissions",
    size = 5,
    x = 1996,
    y = 44,
    color = "white",
    fontface = "bold"
  ) +
  #Creating the areas
  geom_area(
    data = data4,
    aes(x = Year, y = EmissionsByRegion, fill = Region),
    colour = "white",
    size = 0.6,
    show.legend = F
  ) +
  #Faceting by Region
  facet_wrap( ~ Region) +
  labs(title = "Global CO2 Emissions by Region (1900–2017)",
       subtitle = "Comparing regional emissions with global totals",
       y = "CO2 emissions (billion metric tons)") +
  #Changing the axis
  scale_x_continuous(breaks = c(1900, 1925, 1950, 1975, 2000)) +
  scale_y_continuous(breaks = c(20, 40, 60, 80)) +
  #grid and margins
  theme_light() +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(linetype = "dotted", color = "black"),
    panel.grid.minor.y = element_blank(),
    plot.margin = margin(t = 10, b = 20, r = 15, l= 10)
  ) +
  #Colors of each region and economy associated, with increasing intensity depending on the global impact of each country
  scale_fill_manual(
    values = c(
      "China" = "#C65F0C",
      "Former USSR countries" = "#F1740E",
      "Rest of world" = "#F38D39",
      "Other developed" = "#86BBCC",
      "European Union" = "#3C7C96",
      "United States" = "#2F6175",
      "Developed economies" = "#3C7C96",
      "Developing economies"  = "#F1740E"
    ),
    breaks = c("Developed economies", "Developing economies")
  ) +
  #Format of the facets, spacing, titles and text for a better aesthetic
  theme(
    plot.title = element_text(size = 22, hjust = 0.5,
                              face = "bold", margin = margin(b= 10, t= 15)),
    plot.subtitle = element_text(
      size = 17.5,
      hjust = 0.5,
      color = "darkgrey",
      margin = margin(b= 15)
    ),
    axis.text.y = element_text(size = 17),
    axis.text.x = element_text(size = 14),
    axis.title.y = element_text(margin = margin(r = 20, l= 10)),
    axis.title.x = element_text(margin = margin(t= 20, b=5)),
    axis.title = element_text(size = 17, face = "bold"),
    strip.text = element_text(size = 14, color = "black"),
    strip.background = element_rect(fill = "white", color = "black"),
    panel.spacing = unit(0.75, "cm")
  ) +
  #annotation with the maximum value of each region
  geom_label(
    data = MaxValue,
    aes(x = Year, y = EmissionsByRegion, label = EmissionsByRegion),
    vjust = -1.3,
    size = 5.5
  ) +
  #Arrows that point the peak
  geom_segment(
    data = MaxValue,
    aes(
      x = Year,
      xend = Year,
      y = EmissionsByRegion + 10,
      yend = EmissionsByRegion
    ),
    arrow = arrow(length = unit(0.15, "cm"), type = "closed"),
    linewidth = 0.75
  ) +
  #Invisible areas to make a legend
  geom_area(data = data4,
            aes(x = Year, y = EmissionsByRegion, fill = Economy),
            alpha = 0) +
  #Average of a region over the years
  geom_line(data = Average,
            aes(x = Year, y = average, linetype = Name),
            color = "black") +
  #Making the legend opaque and readable
  guides(fill = guide_legend(override.aes = list(alpha = 1), order = 1)) +
  #Legend placement and aesthetic
  theme(
    legend.title = element_blank(),
    legend.text = element_text(size = 17.5),
    legend.position = "bottom"
  ) +
  #Legend for the average line
  scale_linetype_manual(name = "",
                        values = c("Regional Average" = "dashed")) +
  #Ranking
  geom_text(data = Ranking,
            aes(
              x = 2019,
              y = EmissionsByRegion,
              label = paste0("#", Rank)
            ),
            hjust = 0) +
  #Variation
  annotate(
    "text",
    label = "Variation 2000-2017: ",
    x = 1910,
    y = 75,
    color = "black",
    size = 4,
    hjust = 0
  ) +
  geom_text(
    data = Variation,
    aes(x = 1910, y = 70, label = VariationPercentage),
    color = "black",
    size = 4,
    hjust = 0,
    fontface = "bold"
  )

Improved Version

Now you should get something like this:

Conclusion

This project allowed me to gain a lot of knowledge about ggplot and programming in general. I acquired the skill of properly filter, organize and create new data sets. It aided me in fully comprehending the qualities that make a graph efficient and how to enhance it while keeping it visually pleasing. The most surprising thing about this assignment was to realize the number of media outlets, including major ones like the New York Times, use ineffective visualizations.

Overall, I really enjoyed doing this project and go out of my way to try to replicate and improve the graph as much as I could.

Source: “Global, Regional, and National Fossil-Fuel CO₂ Emissions (1751 – 2014, V.2017)” by Boden, Marland & Andres, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory; country classifications via United Nations Total CO2 emissions are from fossil fuels and cement production and do not include land use and forestry-related emissions. In the worldwide carbon emissions graphic (middle), Russia data includes the U.S.S.R. through 1991, but only the Russian Federation afterward.

https://www.nytimes.com/2019/02/28/learning/teach-about-climate-change-with-these-24-new-york-times-graphs.html https://www.un.org/en/development/desa/policy/wesp/wesp_current/2014wesp_country_classification.pdf
https://zenodo.org/records/4281271