This comprehensive analysis of Association Rules Mining on the UCI Adult Census Income dataset reveals hidden patterns in demographic and economic characteristics that strongly predict income levels above $50,000.
Association Rules Mining is a powerful technique for discovering interesting relationships and patterns between variables in large datasets.
In Simple Terms: It answers questions like “What characteristics appear together?” and “If someone has characteristic A, how much more likely are they to have characteristic B?”
The UCI Adult Census Income Dataset contains demographic and employment information for 48,842 individuals:
## ✓ All packages ready
## ✓ Dataset Loaded Successfully
| Income | Count | Percentage |
|---|---|---|
| <=50K | 22654 | 75.1% |
| >50K | 7508 | 24.9% |
Data Quality Note: The dataset is balanced with 75.1% earning ≤$50K and 24.9% earning >$50K, providing a realistic representation of income distribution.
To find meaningful association rules, we convert continuous variables into categorical ranges:
## ✓ Data prepared for mining
| Metric | Support | Confidence | Lift |
|---|---|---|---|
| Count | 11.0000 |
|
|
| Min | 0.0525 | 0.5011 | 2.0133 |
| Mean | 0.0619 | 0.5995 | 2.4085 |
| Median | 0.0597 | 0.5722 | 2.2988 |
| Max | 0.0750 | 0.6878 | 2.7632 |
| Rank | Rule | Support | Confidence | Lift |
|---|---|---|---|---|
| 1 | {education=Bachelors,relationship=Husband} => {income=>50K} | 5.25% | 68.78% | 2.763 |
| 2 | {marital_status=Married-civ-spouse,occupation=Exec-managerial} => {income=>50K} | 5.42% | 68.31% | 2.744 |
| 3 | {education=Bachelors,marital_status=Married-civ-spouse} => {income=>50K} | 5.87% | 68.18% | 2.739 |
| 4 | {capital_gain=HasGain} => {income=>50K} | 5.29% | 62.84% | 2.525 |
| 5 | {capital_gain=HasGain,capital_loss=NoLoss} => {income=>50K} | 5.29% | 62.84% | 2.525 |
| 6 | {relationship=Husband,hours_per_week=Heavy} => {income=>50K} | 7.07% | 57.22% | 2.299 |
| 7 | {marital_status=Married-civ-spouse,hours_per_week=Heavy} => {income=>50K} | 7.41% | 57.06% | 2.292 |
| 8 | {age=46-55,relationship=Husband} => {income=>50K} | 5.97% | 56.74% | 2.280 |
| 9 | {age=46-55,marital_status=Married-civ-spouse} => {income=>50K} | 6.51% | 56.42% | 2.267 |
| 10 | {age=36-45,marital_status=Married-civ-spouse} => {income=>50K} | 7.5% | 50.97% | 2.048 |
| 11 | {age=36-45,relationship=Husband} => {income=>50K} | 6.51% | 50.11% | 2.013 |
Condition: {education=Bachelors,relationship=Husband} => {income=>50K}
What it means: If someone meets the above condition, there’s a 69% probability they earn >$50K
Strength: This pattern is 2.76 times more likely than random guessing
Frequency: This pattern appears in 5.2% of the population
Condition: {marital_status=Married-civ-spouse,occupation=Exec-managerial} => {income=>50K}
What it means: If someone meets the above condition, there’s a 68% probability they earn >$50K
Strength: This pattern is 2.74 times more likely than random guessing
Frequency: This pattern appears in 5.4% of the population
Condition: {education=Bachelors,marital_status=Married-civ-spouse} => {income=>50K}
What it means: If someone meets the above condition, there’s a 68% probability they earn >$50K
Strength: This pattern is 2.74 times more likely than random guessing
Frequency: This pattern appears in 5.9% of the population
Support: How common is this pattern? - Example: 8% support means 8 out of 100 people have this characteristic combination - Higher values = more common patterns
Confidence: How reliable is the prediction? - Example: 65% confidence means if someone meets the condition, 65% will earn >$50K - Higher values = more trustworthy rules
Lift: How much better than random chance? - Example: 2.5 lift means the rule is 2.5 times better than guessing randomly - Values > 1 = positive relationship | Value = 1 = no relationship | Values < 1 = negative relationship
| Rank | Rule | Support | Confidence | Lift |
|---|---|---|---|---|
| 1 | {education=Bachelors,relationship=Husband} => {income=>50K} | 5.25% | 68.78% | 2.763 |
| 2 | {marital_status=Married-civ-spouse,occupation=Exec-managerial} => {income=>50K} | 5.42% | 68.31% | 2.744 |
| 3 | {education=Bachelors,marital_status=Married-civ-spouse} => {income=>50K} | 5.87% | 68.18% | 2.739 |
| 4 | {capital_gain=HasGain} => {income=>50K} | 5.29% | 62.84% | 2.525 |
| 5 | {capital_gain=HasGain,capital_loss=NoLoss} => {income=>50K} | 5.29% | 62.84% | 2.525 |
| 6 | {relationship=Husband,hours_per_week=Heavy} => {income=>50K} | 7.07% | 57.22% | 2.299 |
| 7 | {marital_status=Married-civ-spouse,hours_per_week=Heavy} => {income=>50K} | 7.41% | 57.06% | 2.292 |
| 8 | {age=46-55,relationship=Husband} => {income=>50K} | 5.97% | 56.74% | 2.280 |
| 9 | {age=46-55,marital_status=Married-civ-spouse} => {income=>50K} | 6.51% | 56.42% | 2.267 |
| 10 | {age=36-45,marital_status=Married-civ-spouse} => {income=>50K} | 7.5% | 50.97% | 2.048 |
| 11 | {age=36-45,relationship=Husband} => {income=>50K} | 6.51% | 50.11% | 2.013 |
Note: Total of 11 rules discovered. Showing top 30 ranked by Lift factor.
This comprehensive analysis of the Adult Census dataset revealed 11 meaningful association rules that predict income levels with remarkable accuracy.
Report Generated: February 01, 2026
R Version: R version 4.5.1 (2025-06-13)
Dataset: UCI Adult Census Income
Records Analyzed: 30162
Association Rules Found: 11
Algorithm: Apriori with Support ≥ 5% & Confidence ≥ 50%
Libraries Used: arules, tidyverse, ggplot2, knitr