DataDive7

Come up with 2 null hypothesis: determine the power level, alpha level, and minimum effect size for each and explain.

Determine if you have enough data to perform a Neyman-Pearson hypothesis test. If you do, perform one and interpret results. If not, explain why.
Perform a Fisher’s style test for significance, and interpret the p-value
Build two visualizations that best illustrate the results from the two pairs of hypothesis tests, one for each null hypothesis.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggthemes)
library(ggrepel)
setwd("C:/Users/kaitl/OneDrive/Documents/590_Working")

#update data types of dataframe
energy <- read_delim("./590_FinalData1.csv", delim = ",", col_types = "nccnncnnnnnnnn")

## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)

energy1 <- energy
energy1[energy1 == '..'] <- NA

Null hypothesis:

The average proportion of renewable energy that makes up the total final energy consumption is larger than the year before it.
1. power level = (1 - beta error) = chance that you will reject the null hypothesis (and prove the hypothesis). I am going to use 0.8 for both alpha levels because we want the power level to be high.
1. alpha level = I think 1 - 0.95 = 0.05 will work fine for this analysis because the type 1 error (basically that renewable energy proportions decrease) will not horrifically alter my results and ultimate decisions based on the data.
2. minimum effect size = I am very unsure how to calculate this measure. Mean of control - mean of tested group / std dev
3. Enough to perform a neyman-pearson hypothesis? No
4. fisher.test(select(ctr_table, renenergy, not_ren))
5. fisher test:
```
# ctr_table <- energy1 |>
#   group_by(year, country_name) |>
#   summarize(renenergy = sum(ren_energy_output),
#             not_ren = sum(total_elec_output) - sum(ren_energy_output))
# 
# view(ctr_table)
#fisher.test(select(ctr_table, renenergy, not_ren))
```
suggests that there is significance between the two values
Countries with full access to electricity have a higher proportion of renewable energy output/consumption than countries without full electricity access.
1. power level = 0.8 ^ explanation above.
2. alpha level = 0.05 because the probability of the type I error (that some countries without full electricity access have a larger chance of using more renewable energy) is okay to 5%
3. minimum effect size =
4. Enough to perform a neyman-pearson hypothesis? No
5. Fisher test:
```
# ctr_table <- energy1 |>
#   group_by(year, country_name) |>
#   summarize(total_consumption = sum(TFEC),
#             not_ren = sum(TFEC) - sum(ren_energy_output))
# 
# ctr_table
#fisher.test(select(ctr_table, total_consumption, not_ren))
```
  1. suggests that there is no significance between the two values

two visualizations:

DataDive7

2023-10-06