This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
This exercise is under construction. Please report any errors at https://forms.gle/2W4tffs4YJA1jeBv9
Goal: Understand and experience Benford’s Law in action.
Background: The election-iran-2009.csv dataset contains the actual results of the 2009 Iranian presidential elections that Mahmoud Ahmadinejad won. Following this election, there were wide-spread allegations about voter fraud resulting in protests by millions of people. Your job is to apply Benford’s Law to find out if there was fraud in the elections.
Before starting: 1. You are not allowed to search for solutions to this assignment. 2. You are allowed to search information about packages and functions that can help you.
Individual assignment only: 50 total points (Rmd and html solution) Team assignment: 30 points (written analysis)
Start by entering your name and today’s date in Lines 3 and 4, respectively. Then, run the chunk of code below by clicking on the green arrow (that points to the right) on the top right of the chunk. Tip: I numbered code chunks corresponding to their numbers. Chunk 1 specified the knitting parameters.
Before getting started, clear your Environment using the rm command inline. Then, Restart R and Clear Output.
# Clear the environment
rm(list = ls())
Read, store, and inspect the file election-iran-2009.csv. Tip: Take a peek at the file before you read it. Rubric: 1 each point for reading and storing; 2 points for using 2 R commands for inspecting.
library(dplyr)
##
## 载入程辑包:'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
file_path <- "election-iran-2009.csv"
file_content <- readLines(file_path, n = 5) # Read the first 5 lines
cat(file_content, sep = "\n") # Print the content
## Region,Ahmadinejad,% ,Rezai,%,Karrubi,%,Mousavi,%,Total votes,Invalid votes,Valid votes,Eligible voters,"Turnout, %"
## East Azerbaijan,"1,131,111",56.75,"16,920",0.85,"7,246",0.36,"837,858",42.04,"2,010,340","17,205","1,993,135","2,461,553",80.97
## West Azerbaijan,"623,946",47.48,"12,199",0.93,"21,609",1.64,"656,508",49.95,"1,334,356","20,094","1,314,262","1,883,144",69.79
## Ardabil,"325,911",51.11,"6,578",1.03,"2,319",0.36,"302,825",47.49,"642,005","4,372","637,633","804,881",79.22
## Isfahan,"1,799,255",68.88,"51,788",1.98,"14,579",0.56,"746,697",28.58,"2,637,482","25,163","2,612,319","2,987,946",87.43
election_data <- read.csv(file_path)
head(election_data)
## Region Ahmadinejad X. Rezai X..1 Karrubi X..2 Mousavi X..3
## 1 East Azerbaijan 1,131,111 56.75 16,920 0.85 7,246 0.36 837,858 42.04
## 2 West Azerbaijan 623,946 47.48 12,199 0.93 21,609 1.64 656,508 49.95
## 3 Ardabil 325,911 51.11 6,578 1.03 2,319 0.36 302,825 47.49
## 4 Isfahan 1,799,255 68.88 51,788 1.98 14,579 0.56 746,697 28.58
## 5 Ilam 199,654 64.58 5,221 1.69 7,471 2.42 96,826 31.32
## 6 Bushehr 299,357 61.37 7,608 1.56 3,563 0.73 177,268 36.34
## Total.votes Invalid.votes Valid.votes Eligible.voters Turnout...
## 1 2,010,340 17,205 1,993,135 2,461,553 80.97
## 2 1,334,356 20,094 1,314,262 1,883,144 69.79
## 3 642,005 4,372 637,633 804,881 79.22
## 4 2,637,482 25,163 2,612,319 2,987,946 87.43
## 5 312,667 3,495 309,172 357,687 86.44
## 6 493,989 6,193 487,796 580,822 83.98
summary(election_data)
## Region Ahmadinejad X. Rezai
## Length:30 Length:30 Min. :46.07 Length:30
## Class :character Class :character 1st Qu.:59.38 Class :character
## Mode :character Mode :character Median :68.37 Mode :character
## Mean :65.66
## 3rd Qu.:72.90
## Max. :77.78
##
## X..1 Karrubi X..2 Mousavi
## Min. :0.6800 Length:30 Min. :0.2400 Length:30
## 1st Qu.:0.9575 Class :character 1st Qu.:0.4225 Class :character
## Median :1.1800 Mode :character Median :0.6250 Mode :character
## Mean :1.5817 Mean :0.9377
## 3rd Qu.:1.5600 3rd Qu.:1.1550
## Max. :6.9200 Max. :4.6100
##
## X..3 Total.votes Invalid.votes Valid.votes
## Min. :20.49 Length:30 Length:30 Length:30
## 1st Qu.:24.48 Class :character Class :character Class :character
## Median :28.39 Mode :character Mode :character Mode :character
## Mean :31.82
## 3rd Qu.:38.44
## Max. :51.97
##
## Eligible.voters Turnout...
## Length:30 Min. :63.41
## Class :character 1st Qu.:80.25
## Mode :character Median :86.04
## Mean :84.33
## 3rd Qu.:88.06
## Max. :99.43
## NA's :3
Describe any problems that you observe after inspecting the data. Then, fix these problems. Tip: Google read_csv. Rubric: 2 points for identifying problems; 8 points for fixing them.
# I think there are some problems which are duplicated column names, missing values(and the data type of it), and different data type in voter numbers and percentage
library(dplyr)
# Fixing duplicated column names
print(colnames(election_data))
## [1] "Region" "Ahmadinejad" "X." "Rezai"
## [5] "X..1" "Karrubi" "X..2" "Mousavi"
## [9] "X..3" "Total.votes" "Invalid.votes" "Valid.votes"
## [13] "Eligible.voters" "Turnout..."
# Change the column names
new_column_names <- c("Region", "Ahmadinejad", "Percentage_Ahmadinejad", "Rezai", "Percentage_Rezai",
"Karrubi", "Percentage_Karrubi", "Mousavi", "Percentage_Mousavi",
"Total_votes", "Invalid_votes", "Valid_votes", "Eligible_voters", "Turnout_percentage")
colnames(election_data) <- new_column_names
# Print the updated column names
print(colnames(election_data))
## [1] "Region" "Ahmadinejad" "Percentage_Ahmadinejad"
## [4] "Rezai" "Percentage_Rezai" "Karrubi"
## [7] "Percentage_Karrubi" "Mousavi" "Percentage_Mousavi"
## [10] "Total_votes" "Invalid_votes" "Valid_votes"
## [13] "Eligible_voters" "Turnout_percentage"
eligible_voters_value <- election_data %>%
filter(Region == "South Khorasan") %>%
pull(Eligible_voters)
# Print the result
cat(eligible_voters_value)
#the missing value is not NA, so transfer it
election_data<- election_data%>%
mutate(across(everything(), ~ifelse(. == "", NA, .)))
# Impute missing values with the mean of each column
# Install and load the dplyr package if not already installed
# install.packages("dplyr")
library(dplyr)
# Identify numeric columns
numeric_cols <- names(election_data)[sapply(election_data, is.numeric)]
# Impute missing values with the mean for numeric columns only when the value is NA
election_data_imputed <- election_data %>%
mutate(across(.cols = numeric_cols, ~ifelse(is.na(.), mean(., na.rm = TRUE), .)))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(numeric_cols)
##
## # Now:
## data %>% select(all_of(numeric_cols))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
# Print the updated data
head(election_data_imputed)
## Region Ahmadinejad Percentage_Ahmadinejad Rezai Percentage_Rezai
## 1 East Azerbaijan 1,131,111 56.75 16,920 0.85
## 2 West Azerbaijan 623,946 47.48 12,199 0.93
## 3 Ardabil 325,911 51.11 6,578 1.03
## 4 Isfahan 1,799,255 68.88 51,788 1.98
## 5 Ilam 199,654 64.58 5,221 1.69
## 6 Bushehr 299,357 61.37 7,608 1.56
## Karrubi Percentage_Karrubi Mousavi Percentage_Mousavi Total_votes
## 1 7,246 0.36 837,858 42.04 2,010,340
## 2 21,609 1.64 656,508 49.95 1,334,356
## 3 2,319 0.36 302,825 47.49 642,005
## 4 14,579 0.56 746,697 28.58 2,637,482
## 5 7,471 2.42 96,826 31.32 312,667
## 6 3,563 0.73 177,268 36.34 493,989
## Invalid_votes Valid_votes Eligible_voters Turnout_percentage
## 1 17,205 1,993,135 2,461,553 80.97
## 2 20,094 1,314,262 1,883,144 69.79
## 3 4,372 637,633 804,881 79.22
## 4 25,163 2,612,319 2,987,946 87.43
## 5 3,495 309,172 357,687 86.44
## 6 6,193 487,796 580,822 83.98
#check the data type
str(election_data_imputed)
## 'data.frame': 30 obs. of 14 variables:
## $ Region : chr "East Azerbaijan" "West Azerbaijan" "Ardabil" "Isfahan" ...
## $ Ahmadinejad : chr "1,131,111" "623,946" "325,911" "1,799,255" ...
## $ Percentage_Ahmadinejad: num 56.8 47.5 51.1 68.9 64.6 ...
## $ Rezai : chr "16,920" "12,199" "6,578" "51,788" ...
## $ Percentage_Rezai : num 0.85 0.93 1.03 1.98 1.69 1.56 1.99 4.61 1.04 1.42 ...
## $ Karrubi : chr "7,246" "21,609" "2,319" "14,579" ...
## $ Percentage_Karrubi : num 0.36 1.64 0.36 0.56 2.42 0.73 0.91 0.84 0.24 0.43 ...
## $ Mousavi : chr "837,858" "656,508" "302,825" "746,697" ...
## $ Percentage_Mousavi : num 42 50 47.5 28.6 31.3 ...
## $ Total_votes : chr "2,010,340" "1,334,356" "642,005" "2,637,482" ...
## $ Invalid_votes : chr "17,205" "20,094" "4,372" "25,163" ...
## $ Valid_votes : chr "1,993,135" "1,314,262" "637,633" "2,612,319" ...
## $ Eligible_voters : chr "2,461,553" "1,883,144" "804,881" "2,987,946" ...
## $ Turnout_percentage : num 81 69.8 79.2 87.4 86.4 ...
#transfernm the chr data type to num
election_data_imputed <- election_data_imputed %>%
mutate_at(vars(Ahmadinejad, Rezai, Karrubi, Mousavi, Total_votes, Invalid_votes, Valid_votes, Eligible_voters),
~ as.numeric(gsub(",", "", .)))
#fix the last three missing values in Eligible_voters
election_data_imputed <- election_data_imputed %>%
rowwise() %>%
mutate(
Eligible_voters = ifelse(
is.na(Eligible_voters),
as.numeric(Valid_votes) / Turnout_percentage* 100,
Eligible_voters
)
)
Apply Benford’s Law to analyze the results on the data. Tip: Find another exercise on the web that applies Benford’s Analysis and then apply what you learned to your analysis. I am not expecting anything beyond 1st digit analysis in this question.
# Install and load the benford.analysis package
#install.packages("benford.analysis")
# Install and load the benford.analysis package
#install.packages("benford.analysis")
library(benford.analysis)
## Warning: 程辑包'benford.analysis'是用R版本4.3.2 来建造的
# Assuming your data frame is named 'election_data_imputed'
# Identify numeric columns excluding 'Region'
numeric_cols <- sapply(election_data_imputed, is.numeric) & names(election_data_imputed) != "Region"
# Apply Benford's Law analysis to numeric columns
benford_results <- lapply(election_data_imputed[, numeric_cols, drop = FALSE], function(column) {
# Extract the first digits of the numerical values
first_digits <- as.numeric(substr(as.character(column), 1, 1))
# Apply Benford's Law analysis
benford(first_digits)
})
# Print the results for each column
for (i in seq_along(benford_results)) {
print(paste("Column:", names(benford_results)[i]))
print(benford_results[[i]])
}
## [1] "Column: Ahmadinejad"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 7
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.428
## Var 0.084
## Ex.Kurtosis -1.024
## Skewness -0.238
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 5.76
## 2 30 5.57
## 3 40 4.68
## 4 20 4.36
## 5 50 2.74
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 276.58, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.026533, df = 2, p-value = 0.4511
##
## Mean Absolute Deviation (MAD): 0.01968257
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Ahmadinejad"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 3
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.7796
## Var 0.0054
## Ex.Kurtosis -0.0144
## Skewness -0.9327
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 70 12.82
## 2 60 8.78
## 3 50 5.74
## 4 40 1.68
## 5 10 1.24
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 1412.5, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.81259, df = 2, p-value = 2.588e-11
##
## Mean Absolute Deviation (MAD): 0.02149638
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Rezai"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 7
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.419
## Var 0.142
## Ex.Kurtosis -1.756
## Skewness -0.063
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 10.76
## 2 70 4.82
## 3 50 2.74
## 4 40 2.68
## 5 80 1.84
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 336.02, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.26339, df = 2, p-value = 0.0003702
##
## Mean Absolute Deviation (MAD): 0.01966932
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Rezai"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 21
## Number of obs. for second order = 3
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.094
## Var 0.048
## Ex.Kurtosis 3.579
## Skewness 2.204
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 16.13
## 2 20 1.56
## 3 60 0.85
## 4 11 0.79
## 5 40 0.77
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 331.54, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.56232, df = 2, p-value = 7.44e-06
##
## Mean Absolute Deviation (MAD): 0.02043368
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Karrubi"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 7
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.36
## Var 0.10
## Ex.Kurtosis -1.25
## Skewness 0.27
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 8.76
## 2 20 7.36
## 3 40 4.68
## 4 70 2.82
## 5 11 1.13
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 295.42, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.027667, df = 2, p-value = 0.436
##
## Mean Absolute Deviation (MAD): 0.01968257
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Karrubi"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 9
## Number of obs. for second order = 2
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.134
## Var 0.048
## Ex.Kurtosis 0.170
## Skewness 1.238
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 5.63
## 2 20 1.81
## 3 40 0.90
## 4 11 0.34
## 5 12 0.31
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 118.97, df = 89, p-value = 0.01861
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.27851, df = 2, p-value = 0.08155
##
## Mean Absolute Deviation (MAD): 0.0205932
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Mousavi"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 8
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.49
## Var 0.12
## Ex.Kurtosis -1.33
## Skewness -0.16
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 5.76
## 2 30 4.57
## 3 20 4.36
## 4 90 2.86
## 5 70 2.82
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 285.87, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.042186, df = 2, p-value = 0.2821
##
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Percentage_Mousavi"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 3
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.416
## Var 0.018
## Ex.Kurtosis -1.200
## Skewness 0.564
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 20 15.36
## 2 30 6.57
## 3 40 5.68
## 4 10 1.24
## 5 11 1.13
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 603.19, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.47544, df = 2, p-value = 6.391e-07
##
## Mean Absolute Deviation (MAD): 0.02100547
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Total_votes"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 8
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.54
## Var 0.10
## Ex.Kurtosis -0.94
## Skewness -0.48
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 30 4.57
## 2 60 3.78
## 3 10 3.76
## 4 20 3.36
## 5 90 2.86
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 308.99, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.05855, df = 2, p-value = 0.1726
##
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Invalid_votes"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 8
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.39
## Var 0.12
## Ex.Kurtosis -1.40
## Skewness 0.16
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 8.76
## 2 20 5.36
## 3 50 3.74
## 4 30 2.57
## 5 90 1.86
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 251.3, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.055737, df = 2, p-value = 0.1878
##
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Valid_votes"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 8
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.53
## Var 0.11
## Ex.Kurtosis -1.04
## Skewness -0.47
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 4.76
## 2 30 4.57
## 3 90 2.86
## 4 70 2.82
## 5 60 2.78
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 293.71, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.077866, df = 2, p-value = 0.09671
##
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Eligible_voters"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 8
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.46
## Var 0.13
## Ex.Kurtosis -1.51
## Skewness -0.16
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 10 7.76
## 2 20 3.36
## 3 80 2.84
## 4 60 2.78
## 5 50 2.74
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 263.21, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.11771, df = 2, p-value = 0.02927
##
## Mean Absolute Deviation (MAD): 0.01956268
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values![1] "Column: Turnout_percentage"
##
## Benford object:
##
## Data: first_digits
## Number of observations used = 30
## Number of obs. for second order = 3
## First digits analysed = 2
##
## Mantissa:
##
## Statistic Value
## Mean 0.8956
## Var 0.0019
## Ex.Kurtosis 1.4428
## Skewness -1.1274
##
##
## The 5 largest deviations:
##
## digits absolute.diff
## 1 80 18.84
## 2 90 4.86
## 3 70 3.82
## 4 60 1.78
## 5 10 1.24
##
## Stats:
##
## Pearson's Chi-squared test
##
## data: first_digits
## X-squared = 2479.2, df = 89, p-value < 2.2e-16
##
##
## Mantissa Arc Test
##
## data: first_digits
## L2 = 0.92952, df = 2, p-value = 7.751e-13
##
## Mean Absolute Deviation (MAD): 0.02169927
## MAD Conformity - Nigrini (2012): Nonconformity
## Distortion Factor: NaN
##
## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values!
# Visualize the expected vs. observed distribution for each column
for (i in seq_along(benford_results)) {
plot(benford_results[[i]], main = paste("Benford's Law Analysis -", names(benford_results)[i]))
}
Knit to html after eliminating all the errors. Submit both the Rmd and html files. Tip: Do not worry about minor formatting issues.
### This section doesn't require code. Just knit and submit the Rmd and html files.###
Compare and analyze the results with your team and answer the following question: Argue your case for or against the election fraud based on your analysis. Explain your results in the process and suggest next steps for data analysts (not legal/political professionals). Tip: Individual and Team assignments are related and you may need a few iterations after discussing with your team. You may change your code as you consider and develop different approaches. You must clearly take a position whether you have a case based on data, not opinions. Tip: You can share what other analysis you would continue doing if this was a problem you were solving IRL (In Real Life).