A list of at least 3 columns (or values) in your data which are unclear until you read the documentation. E.g., this could be a column name, or just some value inside a cell of your data Why do you think they chose to encode the data the way they did? What could have happened if you didn’t read the documentation?

  1. 3 values that were unclear before reading documentation. Explain why unclear, why it was titled that way, and explain what could have happened if we did not read documentation.

    a. Total final energy consumption: This did not sound like a problem column title until I thought about it.. total final energy consumption for what time period? I then looked around and realized that this was yearly data, where each instance has a year associated with it. It is possible that I could have looked at this data and thought it was the final total energy consumption for all time, which would have given me drastically different results.

    b. Renewable energy share of TFEC: This was unclear because of the acronym TFEC. I had to look at the documentation to understand it was short for total final electricity output (TFEC). This would have been a useless column if I was not able to understand their acronym.

    c. Renewable Electricity Output: This was unclear because I was not sure if this meant they were creating this much electricity or using it. It was unclear where this renewable energy was sourced from (air, wind, solar, ...) or its origin (supplied from the country it is associated with or by another country?). Luckily, the documentation acknowledges that the renewable energy is accumulation of all renewable energy types, except hydro pumped storage. Unfortunately, the documentation does not mention origin, so I will assume these measurements are from the country they are related to.

At least one element or your data that is unclear even after reading the documentation You may need to do some digging, but is there anything about the data that your documentation does not explain?

  1. It is unclear to me why renewable energy consumption and renewable energy output do not line us very well. I would expect renewable energy consumption to nearly match the renewable energy output, but it seems that some countries are consuming far more renewable electricity than they are outputting. From my assumption that the renewable energy output is all from the associated country, I feel like it makes sense for me to assume that renewable energy consumption is not just what a country produces, but also energy that is bought from another neighboring country.

Build a visualization which uses a column of data that is affected by the issue you brought up in bullet #2, above. In this visualization, find a way to highlight the issue, and explain what is unclear and why it might be unclear. You can use color or an annotation, but also make sure to explain your thoughts using Markdown Do you notice any significant risks? If so, what could you do to reduce negative consequences?

I am unsure why R is not plotting the chart with the correct y-axis (the numbers are not in order), but ideally this would have shown the amount of energy a sample of the countries are producing, compared to how much they are consuming by size of the point. I would not say that this is a significant risk, I just have to be careful about my wording when referencing this column of data and how I analyze it.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)

energy <- read_delim("./590_FinalData1.csv", delim = ",")
## Rows: 6993 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): country_name, country_code, rural_electricity_access, total_popula...
## dbl  (1): year
## lgl  (1): full_pop_electricity_access
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
energy1 <- energy
energy1[energy1 == '..'] <- NA
#view(energy1)

energy_sample <- sample_n(energy1, 20, replace = TRUE)

ggplot(energy_sample, aes(x = country_code, y = ren_energy_output), size = ren_energy_cons) + geom_point()