Raking Illustration and Methodology

Raking is a poststratification method that can be used when poststrata are formed using more than one variable, but only the marginal population totals are known.

Raking was first used in the 1940 census to ensure that the complete census data and samples taken from it gave consistent results and was introduced in Deming and Stephan (1940); Brackstone and Rao (1976) further developed the theory. Oh and Scheuren (1983) describe raking ratio estimates for nonresponse.

Consider the following table of sums of weights from a sample:

Black White Asian Native American Other Sum of Weights
Female 300 1200 60 30 30 1620
Male 150 1080 90 30 30 1380
Sum 450 2280 150 60 60 3000

Population marginal totals:

First, adjust the rows by multiplying each cell in the female row by 1510/1620, and each cell in the male row by 1490/1380:

Black White Asian Native American Other Sum of Weights
Female 279.63 1118.52 55.93 27.96 27.96 1510
Male 161.96 1166.09 97.17 32.39 32.39 1490
Total 441.59 2284.61 153.10 60.35 60.35 3000

Next, adjust the columns by multiplying:

Resulting table:

Black White Asian Native American Other Sum of Weights
Female 379.94 1037.93 54.79 46.33 13.90 1532.90
Male 220.06 1082.07 95.21 53.67 16.10 1467.10
Total 600.00 2120.00 150.00 100.00 30.00 3000.00

Now, repeat row adjustment again. After convergence:

Black White Asian Native American Other Sum of Weights
Female 375.59 1021.47 53.72 45.56 13.67 1510
Male 224.41 1098.53 96.28 54.44 16.33 1490
Total 600.00 2120.00 150.00 100.00 30.00 3000

These adjusted weights now match the marginal population totals. The weighting-adjustment factor for white males is 1098.53 / 1080; the weight of each white male increases slightly. The weights for white females are decreased due to their overrepresentation.

Assumptions:

Methodological Overview: Iterative Raking and Analysis Workflow

This section details the statistical method used to analyze the Nilambur survey data and provides a step-by-step summary of the entire process, from initial data cleaning to the final weighted analysis.

What is Iterative Raking?

Iterative Raking (also known as RIM weighting or sample balancing) is a statistical procedure used to adjust the weights of survey respondents to make the sample more representative of a target population. In survey research, it is rare for a sample’s demographic profile to perfectly mirror the population it comes from. For instance, a survey might happen to include a higher percentage of young people or a lower percentage of women than exist in the actual population. This discrepancy, known as sampling bias, can lead to inaccurate conclusions.

Raking corrects for this bias. The process works by iteratively adjusting the weight of each respondent until the sample’s marginal distributions for key demographic variables (e.g., the percentage of males and females, the percentage of different age groups) match the known distributions of those same variables in the true population.

A key feature of raking is that it only requires the marginal population totals, not the joint totals. For example, to rake by age and gender, we only need to know:

We do not need to know the joint distribution (e.g., the exact number of males aged 18-23). This makes it a very powerful and flexible technique. By ensuring our sample’s demographics align with reality, we can produce more accurate and reliable estimates for our variables of interest, such as MLA and government preference.

Summary of the Analysis Workflow:

The analysis of the Nilambur survey data was conducted in several sequential steps:

Data Loading and Cleaning:

The raw survey data was loaded from the .csv file. Relevant columns were selected, and several variables (`Category_Caste_1, Pref_Govt, Gender, Age) were recoded from their numeric codes into descriptive character strings for clarity. Essential data cleaning was performed, such as correcting spellings and standardizing category names to ensure consistency. A Broad_Category variable was created to group detailed caste categories into broader religious and social groups for the weighting process.

Exploratory Data Analysis (Unweighted):

Before applying any weights, an initial analysis was performed on the raw data. This involved calculating the unweighted proportions for key variables like MLA preference. Crucially, this step also involved comparing the demographic profile of the survey respondents (for Gender, Age, and Broad Category) against the known actual population data for Nilambur. This comparison highlighted the specific demographic imbalances in our sample that needed to be corrected.

Preparation for Raking:

The known population proportions for Gender, Age, and Broad_Category were used to create population target tables. These tables define the exact distributions that our weighted sample should match.

Survey Design and Raking:

Using the survey package in R, an unweighted survey design object was created, assigning an initial weight of 1 to every respondent. The rake() function was then applied. This function took our survey data and the population target tables as input and performed the iterative raking procedure, generating a new set of weights for each respondent.

Verification and Final Analysis:

The success of the raking process was verified by using the svymean() function to check the weighted proportions of the raking variables (Gender, Age, Broad_Category). The output confirmed that the weighted sample now mirrored the population distributions.

Generating Weighted Results:

With the new weights, the final analysis was conducted.

The weighted proportions for MLA Preference and Government Preference were calculated.

To provide a measure of uncertainty, the Standard Error (SE) and the Margin of Error (MOE) at a 95% confidence interval were also calculated for the weighted estimates.

A final summary table was produced, comparing the initial Unweighted Proportions against the final, more accurate Weighted Proportions.

Finally, a weighted crosstabulation was created using svyby() to explore the relationship between MLA choice and preferred government, providing deeper insights into the electorate’s preferences.

Weighted Score Survey Tracker for UDF, LDF, BJP and IND for all the rounds

MLA Unweighted_Prop
UDF 47.2
LDF 37.5
IND_Anvar 6.4
CNS 4.8
BJP 3.3
SDPI 0.6
Others 0.3

Caste/ Community Percentage As per Survey

Caste_1 Survey_%
Muslim 49.2
Christian 15.2
Ezhava 13.6
Nair 6.8
Others 3.9
Vishavakarma 2.9
Kalladi 1.9
Mannan 1.6
Kanakkan 1.4
Paniya/Paniyar/Payner/Paniyan 1.1

Age Groups Prop as per Survey (Respondents) Compared With Actuals (Voters)

Age Actual_Prop Survey_Prop
18-23 0.0900 0.0566
24-30 0.1431 0.0855
31-45 0.3447 0.2717
46-60 0.2538 0.3308
60+ 0.1684 0.2553

Broad Religion and Caste_Category Prop as per Survey Compared With Actuals

Broad_Category Actual_Prop Survey_Prop
Muslim 0.4390 0.4918
Hindu 0.3424 0.2629
Christian 0.1080 0.1522
Hindu_SC 0.0772 0.0528
Hindu_ST 0.0334 0.0327

Gender Prop as per Survey Compared With Actuals

Gender Actual_Prop Survey_Prop
Male 0.48836 0.5572
Female 0.51161 0.4428

Introduction to Raking

This document demonstrates how to use iterative raking (or RIM weighting) to adjust the weights of survey respondents so that the sample’s demographic profile matches known population characteristics. We have survey data from Nilambur and known population totals for Gender, Age, and Broad Religious/Caste Category. Raking will allow us to generate more accurate estimates of variables like MLA preference by correcting for over- or under-sampling of certain demographic groups.


1. Preparing for Raking: Defining Population Targets

First, we must prepare the data for the raking process. This involves two key steps: 1. Filtering our survey data to ensure there are no missing values in the variables we will use for weighting (Gender, Age, Broad_Category). 2. Creating “population margin” tables. The rake function in the survey package needs to know the target population counts for each category within our weighting variables. We calculate these by multiplying the known population proportions by our final sample size.

Final Sample Size after removing NAs: 789

Target Population Counts for Gender
Gender Freq
Male 385.3160
Female 403.6603
Target Population Counts for Age
Age Freq
18-23 71.0100
24-30 112.9059
31-45 271.9683
46-60 200.2482
60+ 132.8676
Target Population Counts for Broad Category
Broad_Category Freq
Muslim 346.3710
Hindu 270.1536
Christian 85.2120
Hindu_SC 60.9108
Hindu_ST 26.3526

2. Creating the Survey Design and Raking

With our targets defined, we can now use the survey package. We first create an unweighted survey design object, giving each respondent an initial weight of 1. Then, we apply the rake() function. It iteratively adjusts the respondent weights until the sample’s marginal distributions for Gender, Age, and Broad Category match our specified population targets.

Summary of the new survey weights: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5077 0.6101 0.9484 1.0000 1.1719 2.3742

If the raking is successful, the mean weights should centre around 1. Here in this exercise, the mean weights centre around 1


3. Verifying the Raking Process

It is crucial to check if the raking was successful. We can do this by calculating the weighted proportions of our raking variables in the sample. The results should now closely match the known population proportions. Using svymean provides a clear, simple table of the final proportions.

Weighted Gender Proportions (%)
mean SE
GenderFemale 51.2 0.003
GenderMale 48.8 0.003
Weighted Age Proportions (%)
mean SE
Age18-23 9.0 0.001
Age24-30 14.3 0.001
Age31-45 34.5 0.001
Age46-60 25.4 0.001
Age60+ 16.8 0.001
Weighted Broad Category Proportions (%)
mean SE
Broad_CategoryChristian 10.8 0
Broad_CategoryHindu 34.2 0
Broad_CategoryHindu_SC 7.7 0
Broad_CategoryHindu_ST 3.3 0
Broad_CategoryMuslim 43.9 0

4. Comparing Unweighted vs. Weighted Results

Now for the final step: analyzing our variable of interest, MLA preference. We will calculate the weighted proportions and compare them to the original, unweighted results. We will also include the Standard Error (SE), which measures the variability of our estimate, and the Margin of Error (MOE) at a 95% confidence level, which gives us a range within which the true population value likely falls.

Comparison of Unweighted vs. Weighted MLA Preference (%)
MLA Unweighted Prop (%) Weighted Prop (%) Std. Error Margin of Error (95% CI)
UDF 47.28 45.40 1.89 3.71
LDF 37.39 39.21 1.86 3.65
IND_Anvar 6.46 5.96 0.87 1.70
CNS 4.82 4.53 0.76 1.49
BJP 3.30 4.20 0.83 1.63
SDPI 0.63 0.62 0.29 0.56
Others 0.13 0.08 0.08 0.15

Raked_Design Weighted Diff Between the UDF and the LDF

The weighted Difference Between the Votes of UDF and the LDF
Weighted_Diff SE.diff MOE.diff
6.197 3.498 6.856

The total difference expected between the Votes of UDF and LDF (Victory for UDF) is 10824 Votes.

Weighted Government Preference

Choice of Government
Govt mean SE
UDF 55.4 1.9
LDF 34.2 1.8
NDA 5.7 0.9
CNS 3.6 0.7
Others 1.0 0.4
MLA_Choice UDF LDF NDA CNS Others
BJP 11.0 12.6 76.4 0.0 0.0
CNS 51.6 14.0 2.0 29.6 2.8
IND_Anvar 73.2 13.5 3.5 7.8 2.0
LDF 16.9 76.6 2.7 3.3 0.4
Others 0.0 0.0 100.0 0.0 0.0
SDPI 63.8 12.4 0.0 0.0 23.8
UDF 90.8 4.6 2.3 1.1 1.1

Check the Quality and Method of Sample

The list of PB Surveyed is: 2 7 13 18 23 28 34 39 44 49 55 60 65 70 76 81 86 91 97 102 107 112 118 123 128 134 139 144 149 155 160 165 170 176 181 186 191 197 202 207 212 218 223 228 233 239 244 249 254 260

Average Respondents per booth is: 15.9

The range of number of respondents in a booth is between: 13 19

The total number of approved respondents are: 795