Task 1

I will base on my previous research: Political Consequences of the Implementation of Municipal Heads’ Appointments in Russia: 2016 – 2018.

RQ: Does the dismantling of direct elections on the local level undermine local political machines and, thereby, capacities to deliver votes? Or, vice versa, does it lead to the unification of political support for the regime?

Theoretically, we can expect ambiguous results:

H0: There is no difference between turnout and votes shares in federal presidential elections in the municipalities with elected and appointed mayors.

H1: Appointees deliver higher turnout in federal presidential elections than elected mayors.

I would estimate a probability that H1 hypothesis is true as 0.5.

Prob. of Type II error = 0.2

So, we have 50% chance that H1 is true and 80% power, at the 5% significance level:

  1. a true positive result = 400
  2. a true negative result = 475
  3. a false positive result = 25
  4. a false negative result = 100

Literature:

Beazer, Q., Reuter, O. (2021). Do Authoritarian Elections Help the Poor? Evidence from Russian Cities. the Journal of Politics, 84(1), 437-454.

Harvey C. (2022). Who delivers the votes? Elected versus appointed local executives, election manipulation, and natural support for ruling parties. Electoral Studies, 22, 0-11.

Reuter, O., Buckley, N., Shubenkova, A., Garifullina, G. (2016). Local Elections in Authoritarian Regimes: An Elite-Based Theory With Evidence From Russian Mayoral Elections. Comparative Political Studies, 49(5), 662–697. doi:10.1177/0010414015626439

Nye, J., Vasilyeva, O. (2015). When does local political competition lead to more public goods?: Evidence from Russian regions. Journal of Comparative Economics, 43(3), 650-676.

Task 2

Simulation Data

set.seed(456)
group = rep(letters[1:2], each = 500)
y = rnorm(n = 1000, mean = c(1.2,3.5), sd=3)
head(df <- data.frame(group,
           y))
##   group          y
## 1     a -2.8305642
## 2     a  5.3653267
## 3     a  3.6026240
## 4     a -0.6666772
## 5     a -0.9430706
## 6     a  2.5278168
df %>% 
  group_by(group) %>%
  summarise(mcount = mean(y),
            varcount = var(y))
## # A tibble: 2 × 3
##   group mcount varcount
##   <chr>  <dbl>    <dbl>
## 1 a       2.64     10.1
## 2 b       2.39     10.1

Overall mean, median and standard deviation of Y

df %>% 
  get_summary_stats(type = "common")
## # A tibble: 1 × 10
##   variable     n   min   max median   iqr  mean    sd    se    ci
##   <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 y         1000 -7.69  12.7   2.63  4.24  2.52  3.18   0.1 0.197

Mean, median and standard deviation of Y by two groups

the means and sd in the two groups differ from ones that I set by simulating the data: for group ‘a’ I set mean = 1.2, sd = 3, for group ‘b’ mean = 3.5, sd = 3. Now we see for group ‘a’ mean = 2.6, sd = 3.2, for group ‘b’ mean = 2.3, sd = 3.1.

Why is it so?

We create a random sample from a population, although in reality this is an imitation of random selection.

The sample mean and variance will differ from the true value in a small number of cases. If you set 1000000 - the value of the averages will be closer to the true value.

df %>% 
  group_by(group) %>% 
  get_summary_stats(type = "common") 
## # A tibble: 2 × 11
##   group variable     n   min   max median   iqr  mean    sd    se    ci
##   <chr> <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 a     y          500 -5.94  12.5   2.84  4.34  2.64  3.17 0.142 0.279
## 2 b     y          500 -7.69  12.7   2.42  4.12  2.39  3.18 0.142 0.279

T statistics

H0: µ1 = µ2 (2.64 = 2.39) H1: µ1 ≠ µ2 (2.64 ≠ 2.39)

t = (µ1-µ2) / (√µ1^2/ n1 + √µ2^2 / n2))

t = (2.64 - 2.39) / 0,3818 = 0,25 / 0,16 = 1,5625 - it means that the difference between the two means deviated from the estimated value in the population (0) by 1,56 sigma

we can use online calculator https://gallery.shinyapps.io/dist_calc/ or Z-scores table

p value (X < -0.16 or X > 0.16) = 0.897

The probability is too high, so we cannot reject the null hypothesis: there is no statistically significant difference between the samples.

df$group <- factor(df$group)
t.test(y ~ group, data = df)
## 
##  Welch Two Sample t-test
## 
## data:  y by group
## t = 1.2452, df = 997.99, p-value = 0.2134
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
##  -0.1439696  0.6438918
## sample estimates:
## mean in group a mean in group b 
##        2.644448        2.394487