BenfordInIran

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

This exercise is under construction. Please report any errors at https://forms.gle/2W4tffs4YJA1jeBv9

Goal: Understand and experience Benford’s Law in action.

Background: The election-iran-2009.csv dataset contains the actual results of the 2009 Iranian presidential elections that Mahmoud Ahmadinejad won. Following this election, there were wide-spread allegations about voter fraud resulting in protests by millions of people. Your job is to apply Benford’s Law to find out if there was fraud in the elections.

Before starting: 1. You are not allowed to search for solutions to this assignment. 2. You are allowed to search information about packages and functions that can help you.

Individual assignment only: 50 total points (Rmd and html solution) Team assignment: 30 points (written analysis)

[1 point] Q1.

Start by entering your name and today’s date in Lines 3 and 4, respectively. Then, run the chunk of code below by clicking on the green arrow (that points to the right) on the top right of the chunk. Tip: I numbered code chunks corresponding to their numbers. Chunk 1 specified the knitting parameters.

[2 point] Q2.

Before getting started, clear your Environment using the rm command inline. Then, Restart R and Clear Output.

# Clear the environment
rm(list = ls())

[4 points] Q3.

Read, store, and inspect the file election-iran-2009.csv. Tip: Take a peek at the file before you read it. Rubric: 1 each point for reading and storing; 2 points for using 2 R commands for inspecting.

library(dplyr)

## 
## 载入程辑包：'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

file_path <- "election-iran-2009.csv"
file_content <- readLines(file_path, n = 5)  # Read the first 5 lines
cat(file_content, sep = "\n")  # Print the content

## Region,Ahmadinejad,% ,Rezai,%,Karrubi,%,Mousavi,%,Total votes,Invalid votes,Valid votes,Eligible voters,"Turnout, %"
## East Azerbaijan,"1,131,111",56.75,"16,920",0.85,"7,246",0.36,"837,858",42.04,"2,010,340","17,205","1,993,135","2,461,553",80.97
## West Azerbaijan,"623,946",47.48,"12,199",0.93,"21,609",1.64,"656,508",49.95,"1,334,356","20,094","1,314,262","1,883,144",69.79
## Ardabil,"325,911",51.11,"6,578",1.03,"2,319",0.36,"302,825",47.49,"642,005","4,372","637,633","804,881",79.22
## Isfahan,"1,799,255",68.88,"51,788",1.98,"14,579",0.56,"746,697",28.58,"2,637,482","25,163","2,612,319","2,987,946",87.43

election_data <- read.csv(file_path)
head(election_data)

##            Region Ahmadinejad    X.  Rezai X..1 Karrubi X..2 Mousavi  X..3
## 1 East Azerbaijan   1,131,111 56.75 16,920 0.85   7,246 0.36 837,858 42.04
## 2 West Azerbaijan     623,946 47.48 12,199 0.93  21,609 1.64 656,508 49.95
## 3         Ardabil     325,911 51.11  6,578 1.03   2,319 0.36 302,825 47.49
## 4         Isfahan   1,799,255 68.88 51,788 1.98  14,579 0.56 746,697 28.58
## 5            Ilam     199,654 64.58  5,221 1.69   7,471 2.42  96,826 31.32
## 6         Bushehr     299,357 61.37  7,608 1.56   3,563 0.73 177,268 36.34
##   Total.votes Invalid.votes Valid.votes Eligible.voters Turnout...
## 1   2,010,340        17,205   1,993,135       2,461,553      80.97
## 2   1,334,356        20,094   1,314,262       1,883,144      69.79
## 3     642,005         4,372     637,633         804,881      79.22
## 4   2,637,482        25,163   2,612,319       2,987,946      87.43
## 5     312,667         3,495     309,172         357,687      86.44
## 6     493,989         6,193     487,796         580,822      83.98

summary(election_data)

##     Region          Ahmadinejad              X.           Rezai          
##  Length:30          Length:30          Min.   :46.07   Length:30         
##  Class :character   Class :character   1st Qu.:59.38   Class :character  
##  Mode  :character   Mode  :character   Median :68.37   Mode  :character  
##                                        Mean   :65.66                     
##                                        3rd Qu.:72.90                     
##                                        Max.   :77.78                     
##                                                                          
##       X..1          Karrubi               X..2          Mousavi         
##  Min.   :0.6800   Length:30          Min.   :0.2400   Length:30         
##  1st Qu.:0.9575   Class :character   1st Qu.:0.4225   Class :character  
##  Median :1.1800   Mode  :character   Median :0.6250   Mode  :character  
##  Mean   :1.5817                      Mean   :0.9377                     
##  3rd Qu.:1.5600                      3rd Qu.:1.1550                     
##  Max.   :6.9200                      Max.   :4.6100                     
##                                                                         
##       X..3       Total.votes        Invalid.votes      Valid.votes       
##  Min.   :20.49   Length:30          Length:30          Length:30         
##  1st Qu.:24.48   Class :character   Class :character   Class :character  
##  Median :28.39   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.82                                                           
##  3rd Qu.:38.44                                                           
##  Max.   :51.97                                                           
##                                                                          
##  Eligible.voters      Turnout...   
##  Length:30          Min.   :63.41  
##  Class :character   1st Qu.:80.25  
##  Mode  :character   Median :86.04  
##                     Mean   :84.33  
##                     3rd Qu.:88.06  
##                     Max.   :99.43  
##                     NA's   :3

[8 points] Q4.

Describe any problems that you observe after inspecting the data. Then, fix these problems. Tip: Google read_csv. Rubric: 2 points for identifying problems; 8 points for fixing them.

# I think there are some problems which are duplicated column names, missing values(and the data type of it), and different data type in voter numbers and percentage 
library(dplyr)
# Fixing duplicated column names
print(colnames(election_data))

##  [1] "Region"          "Ahmadinejad"     "X."              "Rezai"          
##  [5] "X..1"            "Karrubi"         "X..2"            "Mousavi"        
##  [9] "X..3"            "Total.votes"     "Invalid.votes"   "Valid.votes"    
## [13] "Eligible.voters" "Turnout..."

# Change the column names
new_column_names <- c("Region", "Ahmadinejad", "Percentage_Ahmadinejad", "Rezai", "Percentage_Rezai",
                      "Karrubi", "Percentage_Karrubi", "Mousavi", "Percentage_Mousavi",
                      "Total_votes", "Invalid_votes", "Valid_votes", "Eligible_voters", "Turnout_percentage")

colnames(election_data) <- new_column_names

# Print the updated column names
print(colnames(election_data))

##  [1] "Region"                 "Ahmadinejad"            "Percentage_Ahmadinejad"
##  [4] "Rezai"                  "Percentage_Rezai"       "Karrubi"               
##  [7] "Percentage_Karrubi"     "Mousavi"                "Percentage_Mousavi"    
## [10] "Total_votes"            "Invalid_votes"          "Valid_votes"           
## [13] "Eligible_voters"        "Turnout_percentage"

eligible_voters_value <- election_data %>%
  filter(Region == "South Khorasan") %>%
  pull(Eligible_voters)

# Print the result
cat(eligible_voters_value)

#the missing value is not NA, so transfer it 
election_data<- election_data%>%
mutate(across(everything(), ~ifelse(. == "", NA, .)))

# Impute missing values with the mean of each column
# Install and load the dplyr package if not already installed
# install.packages("dplyr")
library(dplyr)

# Identify numeric columns
numeric_cols <- names(election_data)[sapply(election_data, is.numeric)]

# Impute missing values with the mean for numeric columns only when the value is NA
election_data_imputed <- election_data %>%
  mutate(across(.cols = numeric_cols, ~ifelse(is.na(.), mean(., na.rm = TRUE), .)))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(numeric_cols)
## 
##   # Now:
##   data %>% select(all_of(numeric_cols))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

# Print the updated data
head(election_data_imputed)

##            Region Ahmadinejad Percentage_Ahmadinejad  Rezai Percentage_Rezai
## 1 East Azerbaijan   1,131,111                  56.75 16,920             0.85
## 2 West Azerbaijan     623,946                  47.48 12,199             0.93
## 3         Ardabil     325,911                  51.11  6,578             1.03
## 4         Isfahan   1,799,255                  68.88 51,788             1.98
## 5            Ilam     199,654                  64.58  5,221             1.69
## 6         Bushehr     299,357                  61.37  7,608             1.56
##   Karrubi Percentage_Karrubi Mousavi Percentage_Mousavi Total_votes
## 1   7,246               0.36 837,858              42.04   2,010,340
## 2  21,609               1.64 656,508              49.95   1,334,356
## 3   2,319               0.36 302,825              47.49     642,005
## 4  14,579               0.56 746,697              28.58   2,637,482
## 5   7,471               2.42  96,826              31.32     312,667
## 6   3,563               0.73 177,268              36.34     493,989
##   Invalid_votes Valid_votes Eligible_voters Turnout_percentage
## 1        17,205   1,993,135       2,461,553              80.97
## 2        20,094   1,314,262       1,883,144              69.79
## 3         4,372     637,633         804,881              79.22
## 4        25,163   2,612,319       2,987,946              87.43
## 5         3,495     309,172         357,687              86.44
## 6         6,193     487,796         580,822              83.98

#check the data type 
str(election_data_imputed)

## 'data.frame':    30 obs. of  14 variables:
##  $ Region                : chr  "East Azerbaijan" "West Azerbaijan" "Ardabil" "Isfahan" ...
##  $ Ahmadinejad           : chr  "1,131,111" "623,946" "325,911" "1,799,255" ...
##  $ Percentage_Ahmadinejad: num  56.8 47.5 51.1 68.9 64.6 ...
##  $ Rezai                 : chr  "16,920" "12,199" "6,578" "51,788" ...
##  $ Percentage_Rezai      : num  0.85 0.93 1.03 1.98 1.69 1.56 1.99 4.61 1.04 1.42 ...
##  $ Karrubi               : chr  "7,246" "21,609" "2,319" "14,579" ...
##  $ Percentage_Karrubi    : num  0.36 1.64 0.36 0.56 2.42 0.73 0.91 0.84 0.24 0.43 ...
##  $ Mousavi               : chr  "837,858" "656,508" "302,825" "746,697" ...
##  $ Percentage_Mousavi    : num  42 50 47.5 28.6 31.3 ...
##  $ Total_votes           : chr  "2,010,340" "1,334,356" "642,005" "2,637,482" ...
##  $ Invalid_votes         : chr  "17,205" "20,094" "4,372" "25,163" ...
##  $ Valid_votes           : chr  "1,993,135" "1,314,262" "637,633" "2,612,319" ...
##  $ Eligible_voters       : chr  "2,461,553" "1,883,144" "804,881" "2,987,946" ...
##  $ Turnout_percentage    : num  81 69.8 79.2 87.4 86.4 ...

#transfernm the chr data type to num
election_data_imputed <- election_data_imputed %>%
  mutate_at(vars(Ahmadinejad, Rezai, Karrubi, Mousavi, Total_votes, Invalid_votes, Valid_votes, Eligible_voters),
            ~ as.numeric(gsub(",", "", .)))
#fix the last three missing values in Eligible_voters
election_data_imputed <- election_data_imputed %>%
  rowwise() %>%
  mutate(
    Eligible_voters = ifelse(
      is.na(Eligible_voters),
      as.numeric(Valid_votes) / Turnout_percentage* 100,
      Eligible_voters
    )
  )

[30 points] Q5.

Apply Benford’s Law to analyze the results on the data. Tip: Find another exercise on the web that applies Benford’s Analysis and then apply what you learned to your analysis. I am not expecting anything beyond 1st digit analysis in this question.

# Install and load the benford.analysis package
#install.packages("benford.analysis")
# Install and load the benford.analysis package
#install.packages("benford.analysis")
library(benford.analysis)

## Warning: 程辑包'benford.analysis'是用R版本4.3.2 来建造的

# Assuming your data frame is named 'election_data_imputed'
# Identify numeric columns excluding 'Region'
numeric_cols <- sapply(election_data_imputed, is.numeric) & names(election_data_imputed) != "Region"

# Apply Benford's Law analysis to numeric columns
benford_results <- lapply(election_data_imputed[, numeric_cols, drop = FALSE], function(column) {
  # Extract the first digits of the numerical values
  first_digits <- as.numeric(substr(as.character(column), 1, 1))
  
  # Apply Benford's Law analysis
  benford(first_digits)
})

# Print the results for each column
for (i in seq_along(benford_results)) {
  print(paste("Column:", names(benford_results)[i]))
  print(benford_results[[i]])
}

## [1] "Column: Ahmadinejad"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 7 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic  Value
##         Mean  0.428
##          Var  0.084
##  Ex.Kurtosis -1.024
##     Skewness -0.238
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          5.76
## 2     30          5.57
## 3     40          4.68
## 4     20          4.36
## 5     50          2.74
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 276.58, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.026533, df = 2, p-value = 0.4511
## 
## Mean Absolute Deviation (MAD): 0.01968257
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Ahmadinejad"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 3 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic   Value
##         Mean  0.7796
##          Var  0.0054
##  Ex.Kurtosis -0.0144
##     Skewness -0.9327
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     70         12.82
## 2     60          8.78
## 3     50          5.74
## 4     40          1.68
## 5     10          1.24
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 1412.5, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.81259, df = 2, p-value = 2.588e-11
## 
## Mean Absolute Deviation (MAD): 0.02149638
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Rezai"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 7 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic  Value
##         Mean  0.419
##          Var  0.142
##  Ex.Kurtosis -1.756
##     Skewness -0.063
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10         10.76
## 2     70          4.82
## 3     50          2.74
## 4     40          2.68
## 5     80          1.84
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 336.02, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.26339, df = 2, p-value = 0.0003702
## 
## Mean Absolute Deviation (MAD): 0.01966932
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Rezai"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 21 
## Number of obs. for second order = 3 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean 0.094
##          Var 0.048
##  Ex.Kurtosis 3.579
##     Skewness 2.204
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10         16.13
## 2     20          1.56
## 3     60          0.85
## 4     11          0.79
## 5     40          0.77
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 331.54, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.56232, df = 2, p-value = 7.44e-06
## 
## Mean Absolute Deviation (MAD): 0.02043368
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Karrubi"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 7 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean  0.36
##          Var  0.10
##  Ex.Kurtosis -1.25
##     Skewness  0.27
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          8.76
## 2     20          7.36
## 3     40          4.68
## 4     70          2.82
## 5     11          1.13
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 295.42, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.027667, df = 2, p-value = 0.436
## 
## Mean Absolute Deviation (MAD): 0.01968257
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Karrubi"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 9 
## Number of obs. for second order = 2 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean 0.134
##          Var 0.048
##  Ex.Kurtosis 0.170
##     Skewness 1.238
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          5.63
## 2     20          1.81
## 3     40          0.90
## 4     11          0.34
## 5     12          0.31
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 118.97, df = 89, p-value = 0.01861
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.27851, df = 2, p-value = 0.08155
## 
## Mean Absolute Deviation (MAD): 0.0205932
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Mousavi"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 8 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean  0.49
##          Var  0.12
##  Ex.Kurtosis -1.33
##     Skewness -0.16
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          5.76
## 2     30          4.57
## 3     20          4.36
## 4     90          2.86
## 5     70          2.82
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 285.87, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.042186, df = 2, p-value = 0.2821
## 
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Mousavi"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 3 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic  Value
##         Mean  0.416
##          Var  0.018
##  Ex.Kurtosis -1.200
##     Skewness  0.564
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     20         15.36
## 2     30          6.57
## 3     40          5.68
## 4     10          1.24
## 5     11          1.13
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 603.19, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.47544, df = 2, p-value = 6.391e-07
## 
## Mean Absolute Deviation (MAD): 0.02100547
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Total_votes"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 8 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean  0.54
##          Var  0.10
##  Ex.Kurtosis -0.94
##     Skewness -0.48
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     30          4.57
## 2     60          3.78
## 3     10          3.76
## 4     20          3.36
## 5     90          2.86
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 308.99, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.05855, df = 2, p-value = 0.1726
## 
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Invalid_votes"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 8 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean  0.39
##          Var  0.12
##  Ex.Kurtosis -1.40
##     Skewness  0.16
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          8.76
## 2     20          5.36
## 3     50          3.74
## 4     30          2.57
## 5     90          1.86
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 251.3, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.055737, df = 2, p-value = 0.1878
## 
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Valid_votes"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 8 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean  0.53
##          Var  0.11
##  Ex.Kurtosis -1.04
##     Skewness -0.47
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          4.76
## 2     30          4.57
## 3     90          2.86
## 4     70          2.82
## 5     60          2.78
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 293.71, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.077866, df = 2, p-value = 0.09671
## 
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Eligible_voters"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 8 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic Value
##         Mean  0.46
##          Var  0.13
##  Ex.Kurtosis -1.51
##     Skewness -0.16
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     10          7.76
## 2     20          3.36
## 3     80          2.84
## 4     60          2.78
## 5     50          2.74
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 263.21, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.11771, df = 2, p-value = 0.02927
## 
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Turnout_percentage"
## 
## Benford object:
##  
## Data: first_digits 
## Number of observations used = 30 
## Number of obs. for second order = 3 
## First digits analysed = 2
## 
## Mantissa: 
## 
##    Statistic   Value
##         Mean  0.8956
##          Var  0.0019
##  Ex.Kurtosis  1.4428
##     Skewness -1.1274
## 
## 
## The 5 largest deviations: 
## 
##   digits absolute.diff
## 1     80         18.84
## 2     90          4.86
## 3     70          3.82
## 4     60          1.78
## 5     10          1.24
## 
## Stats:
## 
##  Pearson's Chi-squared test
## 
## data:  first_digits
## X-squared = 2479.2, df = 89, p-value < 2.2e-16
## 
## 
##  Mantissa Arc Test
## 
## data:  first_digits
## L2 = 0.92952, df = 2, p-value = 7.751e-13
## 
## Mean Absolute Deviation (MAD): 0.02169927
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
## 
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!

# Visualize the expected vs. observed distribution for each column
for (i in seq_along(benford_results)) {
  plot(benford_results[[i]], main = paste("Benford's Law Analysis -", names(benford_results)[i]))
}

[5 points] Q6.

Knit to html after eliminating all the errors. Submit both the Rmd and html files. Tip: Do not worry about minor formatting issues.

### This section doesn't require code. Just knit and submit the Rmd and html files.###

[30 points] Team Question 1 (Answer in the team assignment)

Compare and analyze the results with your team and answer the following question: Argue your case for or against the election fraud based on your analysis. Explain your results in the process and suggest next steps for data analysts (not legal/political professionals). Tip: Individual and Team assignments are related and you may need a few iterations after discussing with your team. You may change your code as you consider and develop different approaches. You must clearly take a position whether you have a case based on data, not opinions. Tip: You can share what other analysis you would continue doing if this was a problem you were solving IRL (In Real Life).