Raking is a poststratification method that can be used when poststrata are formed using more than one variable, but only the marginal population totals are known.
Raking was first used in the 1940 census to ensure that the complete census data and samples taken from it gave consistent results and was introduced in Deming and Stephan (1940); Brackstone and Rao (1976) further developed the theory. Oh and Scheuren (1983) describe raking ratio estimates for nonresponse.
Consider the following table of sums of weights from a sample:
| Black | White | Asian | Native American | Other | Sum of Weights | |
|---|---|---|---|---|---|---|
| Female | 300 | 1200 | 60 | 30 | 30 | 1620 |
| Male | 150 | 1080 | 90 | 30 | 30 | 1380 |
| Sum | 450 | 2280 | 150 | 60 | 60 | 3000 |
Population marginal totals:
First, adjust the rows by multiplying each cell in
the female row by 1510/1620, and each cell in the male row
by 1490/1380:
| Black | White | Asian | Native American | Other | Sum of Weights | |
|---|---|---|---|---|---|---|
| Female | 279.63 | 1118.52 | 55.93 | 27.96 | 27.96 | 1510 |
| Male | 161.96 | 1166.09 | 97.17 | 32.39 | 32.39 | 1490 |
| Total | 441.59 | 2284.61 | 153.10 | 60.35 | 60.35 | 3000 |
Next, adjust the columns by multiplying:
600 / 441.592120 / 2284.61150 / 153.10100 / 60.3530 / 60.35Resulting table:
| Black | White | Asian | Native American | Other | Sum of Weights | |
|---|---|---|---|---|---|---|
| Female | 379.94 | 1037.93 | 54.79 | 46.33 | 13.90 | 1532.90 |
| Male | 220.06 | 1082.07 | 95.21 | 53.67 | 16.10 | 1467.10 |
| Total | 600.00 | 2120.00 | 150.00 | 100.00 | 30.00 | 3000.00 |
Now, repeat row adjustment again. After convergence:
| Black | White | Asian | Native American | Other | Sum of Weights | |
|---|---|---|---|---|---|---|
| Female | 375.59 | 1021.47 | 53.72 | 45.56 | 13.67 | 1510 |
| Male | 224.41 | 1098.53 | 96.28 | 54.44 | 16.33 | 1490 |
| Total | 600.00 | 2120.00 | 150.00 | 100.00 | 30.00 | 3000 |
These adjusted weights now match the marginal population totals. The
weighting-adjustment factor for white males is
1098.53 / 1080; the weight of each white male increases
slightly. The weights for white females are decreased due to their
overrepresentation.
Assumptions:
Methodological Overview: Iterative Raking and Analysis Workflow
This section details the statistical method used to analyze the Nilambur survey data and provides a step-by-step summary of the entire process, from initial data cleaning to the final weighted analysis.
What is Iterative Raking?
Iterative Raking (also known as RIM weighting or sample balancing) is a statistical procedure used to adjust the weights of survey respondents to make the sample more representative of a target population. In survey research, it is rare for a sample’s demographic profile to perfectly mirror the population it comes from. For instance, a survey might happen to include a higher percentage of young people or a lower percentage of women than exist in the actual population. This discrepancy, known as sampling bias, can lead to inaccurate conclusions.
Raking corrects for this bias. The process works by iteratively adjusting the weight of each respondent until the sample’s marginal distributions for key demographic variables (e.g., the percentage of males and females, the percentage of different age groups) match the known distributions of those same variables in the true population.
A key feature of raking is that it only requires the marginal population totals, not the joint totals. For example, to rake by age and gender, we only need to know:
The total number of males and females in the population.
The total number of people in each age bracket (e.g., 18-23, 24-30, etc.).
We do not need to know the joint distribution (e.g., the exact number of males aged 18-23). This makes it a very powerful and flexible technique. By ensuring our sample’s demographics align with reality, we can produce more accurate and reliable estimates for our variables of interest, such as MLA and government preference.
Summary of the Analysis Workflow:
The analysis of the Nilambur survey data was conducted in several sequential steps:
Data Loading and Cleaning:
The raw survey data was loaded from the .csv file. Relevant columns
were selected, and several variables (`Category_Caste_1,
Pref_Govt, Gender, Age) were
recoded from their numeric codes into descriptive character strings for
clarity. Essential data cleaning was performed, such as correcting
spellings and standardizing category names to ensure consistency. A
Broad_Category variable was created to group detailed caste categories
into broader religious and social groups for the weighting process.
Exploratory Data Analysis (Unweighted):
Before applying any weights, an initial analysis was performed on the raw data. This involved calculating the unweighted proportions for key variables like MLA preference. Crucially, this step also involved comparing the demographic profile of the survey respondents (for Gender, Age, and Broad Category) against the known actual population data for Nilambur. This comparison highlighted the specific demographic imbalances in our sample that needed to be corrected.
Preparation for Raking:
The known population proportions for Gender, Age, and Broad_Category were used to create population target tables. These tables define the exact distributions that our weighted sample should match.
Survey Design and Raking:
Using the survey package in R, an unweighted survey design object was created, assigning an initial weight of 1 to every respondent. The rake() function was then applied. This function took our survey data and the population target tables as input and performed the iterative raking procedure, generating a new set of weights for each respondent.
Verification and Final Analysis:
The success of the raking process was verified by using the
svymean() function to check the weighted proportions of the
raking variables (Gender, Age, Broad_Category). The output confirmed
that the weighted sample now mirrored the population distributions.
Generating Weighted Results:
With the new weights, the final analysis was conducted.
The weighted proportions for MLA Preference and Government Preference were calculated.
To provide a measure of uncertainty, the Standard Error (SE) and the Margin of Error (MOE) at a 95% confidence interval were also calculated for the weighted estimates.
A final summary table was produced, comparing the initial Unweighted Proportions against the final, more accurate Weighted Proportions.
Finally, a weighted crosstabulation was created using
svyby() to explore the relationship between MLA choice and
preferred government, providing deeper insights into the electorate’s
preferences.
| MLA | Unweighted_Prop |
|---|---|
| UDF | 47.2 |
| LDF | 37.5 |
| IND_Anvar | 6.4 |
| CNS | 4.8 |
| BJP | 3.3 |
| SDPI | 0.6 |
| Others | 0.3 |
| Caste_1 | Survey_% |
|---|---|
| Muslim | 49.2 |
| Christian | 15.2 |
| Ezhava | 13.6 |
| Nair | 6.8 |
| Others | 3.9 |
| Vishavakarma | 2.9 |
| Kalladi | 1.9 |
| Mannan | 1.6 |
| Kanakkan | 1.4 |
| Paniya/Paniyar/Payner/Paniyan | 1.1 |
| Age | Actual_Prop | Survey_Prop |
|---|---|---|
| 18-23 | 0.0900 | 0.0566 |
| 24-30 | 0.1431 | 0.0855 |
| 31-45 | 0.3447 | 0.2717 |
| 46-60 | 0.2538 | 0.3308 |
| 60+ | 0.1684 | 0.2553 |
| Broad_Category | Actual_Prop | Survey_Prop |
|---|---|---|
| Muslim | 0.4390 | 0.4918 |
| Hindu | 0.3424 | 0.2629 |
| Christian | 0.1080 | 0.1522 |
| Hindu_SC | 0.0772 | 0.0528 |
| Hindu_ST | 0.0334 | 0.0327 |
| Gender | Actual_Prop | Survey_Prop |
|---|---|---|
| Male | 0.48836 | 0.5572 |
| Female | 0.51161 | 0.4428 |
This document demonstrates how to use iterative raking (or RIM weighting) to adjust the weights of survey respondents so that the sample’s demographic profile matches known population characteristics. We have survey data from Nilambur and known population totals for Gender, Age, and Broad Religious/Caste Category. Raking will allow us to generate more accurate estimates of variables like MLA preference by correcting for over- or under-sampling of certain demographic groups.
First, we must prepare the data for the raking process. This involves
two key steps: 1. Filtering our survey data to ensure there are no
missing values in the variables we will use for weighting (Gender, Age,
Broad_Category). 2. Creating “population margin” tables. The
rake function in the survey package needs to
know the target population counts for each category within our weighting
variables. We calculate these by multiplying the known population
proportions by our final sample size.
Final Sample Size after removing NAs: 789
| Gender | Freq |
|---|---|
| Male | 385.3160 |
| Female | 403.6603 |
| Age | Freq |
|---|---|
| 18-23 | 71.0100 |
| 24-30 | 112.9059 |
| 31-45 | 271.9683 |
| 46-60 | 200.2482 |
| 60+ | 132.8676 |
| Broad_Category | Freq |
|---|---|
| Muslim | 346.3710 |
| Hindu | 270.1536 |
| Christian | 85.2120 |
| Hindu_SC | 60.9108 |
| Hindu_ST | 26.3526 |
With our targets defined, we can now use the survey
package. We first create an unweighted survey design object, giving each
respondent an initial weight of 1. Then, we apply the
rake() function. It iteratively adjusts the respondent
weights until the sample’s marginal distributions for Gender, Age, and
Broad Category match our specified population targets.
Summary of the new survey weights: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.5077 0.6101 0.9484 1.0000 1.1719 2.3742
If the raking is successful, the mean weights should centre around 1. Here in this exercise, the mean weights centre around 1
It is crucial to check if the raking was successful. We can do this
by calculating the weighted proportions of our raking variables in the
sample. The results should now closely match the known population
proportions. Using svymean provides a clear, simple table
of the final proportions.
| mean | SE | |
|---|---|---|
| GenderFemale | 51.2 | 0.003 |
| GenderMale | 48.8 | 0.003 |
| mean | SE | |
|---|---|---|
| Age18-23 | 9.0 | 0.001 |
| Age24-30 | 14.3 | 0.001 |
| Age31-45 | 34.5 | 0.001 |
| Age46-60 | 25.4 | 0.001 |
| Age60+ | 16.8 | 0.001 |
| mean | SE | |
|---|---|---|
| Broad_CategoryChristian | 10.8 | 0 |
| Broad_CategoryHindu | 34.2 | 0 |
| Broad_CategoryHindu_SC | 7.7 | 0 |
| Broad_CategoryHindu_ST | 3.3 | 0 |
| Broad_CategoryMuslim | 43.9 | 0 |
Now for the final step: analyzing our variable of interest, MLA preference. We will calculate the weighted proportions and compare them to the original, unweighted results. We will also include the Standard Error (SE), which measures the variability of our estimate, and the Margin of Error (MOE) at a 95% confidence level, which gives us a range within which the true population value likely falls.
| MLA | Unweighted Prop (%) | Weighted Prop (%) | Std. Error | Margin of Error (95% CI) |
|---|---|---|---|---|
| UDF | 47.28 | 45.40 | 1.89 | 3.71 |
| LDF | 37.39 | 39.21 | 1.86 | 3.65 |
| IND_Anvar | 6.46 | 5.96 | 0.87 | 1.70 |
| CNS | 4.82 | 4.53 | 0.76 | 1.49 |
| BJP | 3.30 | 4.20 | 0.83 | 1.63 |
| SDPI | 0.63 | 0.62 | 0.29 | 0.56 |
| Others | 0.13 | 0.08 | 0.08 | 0.15 |
| Weighted_Diff | SE.diff | MOE.diff |
|---|---|---|
| 6.197 | 3.498 | 6.856 |
The total difference expected between the Votes of UDF and LDF (Victory for UDF) is 10824 Votes.
| Govt | mean | SE |
|---|---|---|
| UDF | 55.4 | 1.9 |
| LDF | 34.2 | 1.8 |
| NDA | 5.7 | 0.9 |
| CNS | 3.6 | 0.7 |
| Others | 1.0 | 0.4 |
| MLA_Choice | UDF | LDF | NDA | CNS | Others |
|---|---|---|---|---|---|
| BJP | 11.0 | 12.6 | 76.4 | 0.0 | 0.0 |
| CNS | 51.6 | 14.0 | 2.0 | 29.6 | 2.8 |
| IND_Anvar | 73.2 | 13.5 | 3.5 | 7.8 | 2.0 |
| LDF | 16.9 | 76.6 | 2.7 | 3.3 | 0.4 |
| Others | 0.0 | 0.0 | 100.0 | 0.0 | 0.0 |
| SDPI | 63.8 | 12.4 | 0.0 | 0.0 | 23.8 |
| UDF | 90.8 | 4.6 | 2.3 | 1.1 | 1.1 |
The list of PB Surveyed is: 2 7 13 18 23 28 34 39 44 49 55 60 65 70 76 81 86 91 97 102 107 112 118 123 128 134 139 144 149 155 160 165 170 176 181 186 191 197 202 207 212 218 223 228 233 239 244 249 254 260
Average Respondents per booth is: 15.9
The range of number of respondents in a booth is between: 13 19
The total number of approved respondents are: 795