Determine if you have enough data to perform a Neyman-Pearson hypothesis test. If you do, perform one and interpret results. If not, explain why.
Perform a Fisher’s style test for significance, and interpret the p-value
Build two visualizations that best illustrate the results from the two pairs of hypothesis tests, one for each null hypothesis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
setwd("C:/Users/kaitl/OneDrive/Documents/590_Working")
#update data types of dataframe
energy <- read_delim("./590_FinalData1.csv", delim = ",", col_types = "nccnncnnnnnnnn")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
energy1 <- energy
energy1[energy1 == '..'] <- NA
Null hypothesis:
The average proportion of renewable energy that makes up the total final energy consumption is larger than the year before it.
alpha level = I think 1 - 0.95 = 0.05 will work fine for this analysis because the type 1 error (basically that renewable energy proportions decrease) will not horrifically alter my results and ultimate decisions based on the data.
minimum effect size = I am very unsure how to calculate this measure. Mean of control - mean of tested group / std dev
Enough to perform a neyman-pearson hypothesis? No
fisher.test(select(ctr_table, renenergy, not_ren))
fisher test:
# ctr_table <- energy1 |>
# group_by(year, country_name) |>
# summarize(renenergy = sum(ren_energy_output),
# not_ren = sum(total_elec_output) - sum(ren_energy_output))
#
# view(ctr_table)
#fisher.test(select(ctr_table, renenergy, not_ren))
suggests that there is significance between the two values
Countries with full access to electricity have a higher proportion of renewable energy output/consumption than countries without full electricity access.
power level = 0.8 ^ explanation above.
alpha level = 0.05 because the probability of the type I error (that some countries without full electricity access have a larger chance of using more renewable energy) is okay to 5%
minimum effect size =
Enough to perform a neyman-pearson hypothesis? No
Fisher test:
# ctr_table <- energy1 |>
# group_by(year, country_name) |>
# summarize(total_consumption = sum(TFEC),
# not_ren = sum(TFEC) - sum(ren_energy_output))
#
# ctr_table
#fisher.test(select(ctr_table, total_consumption, not_ren))
two visualizations: