REDESIGN PROJECT

This redesign proposes some changes to tables presented in WIKIPEDIA, which present the results from the 2006 elections for the National Congress in Peru, a topic we will use for the final project too. The tables are found in the following links:

link1 = "http://es.wikipedia.org/wiki/Elecciones_generales_de_Per%C3%BA_de_2006"
link2 = "http://en.wikipedia.org/wiki/Peruvian_general_election,_2006"

DATA PREPARATION STAGE:

We show the links since we will use them to directly scrap the data from that location, using functionalities that R offers to parse webpages. For that, we have organised a function to simplify the code in this data preparation stage:

options(width = 180)
library(XML)
getTableWWW = function(link, which, headers, skips, cols) {
    dirty = getNodeSet(htmlParse(link, encoding = "UTF-8"), "//table")[[which]]
    return(readHTMLTable(dirty, as.data.frame = T, header = headers, skip.rows = skips, 
        colClasses = cols))
}

Using the function above, we can scrap the tables we will redesign:

tableParty = getTableWWW(link1, 5, c("Party", "Seats", "Votes"), c(1, 9), c("character", "numeric", "FormattedInteger"))
tableRegion = getTableWWW(link2, 8, T, 26, c("character", "FormattedInteger", rep("integer", 4)))
tableRegionParty = getTableWWW(link2, 11, T, 26, c("character", rep("integer", 8)))
tableRegionParty[is.na(tableRegionParty)] <- 0

IMPROVEMENT 1:

We will first propose some improvement to the table that shows Party Results, which simple looks like this:

As we can see:

The table presents numbers and no further context (at least we read the whole document).
It does not help comparisson as we can not know from the table who is who. In Politics, especially in multiparty systems, you need to know who is governing and who is the opposition, and who are the the minority parties.
From the point above, we would like to know not only the results, but the political situation, the governing party must go through some negotiations in order to keep governability in case he does not have enough seats.

In this situation, we start by adding to the original table some information, that is the role or the position of the party in congress:

tableParty$Position = c("Opposition", "Governing", rep("", length(tableParty[, 2]) - 2))

As you can see, we have not included the label for “Minority”, because that will later cause us trouble when we decide to include position as a text in the chart. We now make use of lattice and plot a barchart to quickly improve comparability, highlight differences and add some context so as to get attention from the reader:

options(width = 180)
library(lattice)
author = "author: Jose Magallanes"
ord <- order(as.numeric(tableParty$Seats))
levelnew = factor(tableParty$Party)[ord]
tableParty$Party = factor(tableParty$Party, levels = levelnew, ordered = TRUE)
voteOp = tableParty$Seats[1]
voteGb = tableParty$Seats[2]
barchart(Party ~ Seats, data = tableParty, xlab = "Number of Seats obtained\n", ylab = "Parties", sub = paste("Source: National Office of Electoral Processes\n", author), main = "Results in 2006-2011 Elections for National Congress in Peru", 
    scale = list(x = list(limits = c(-1, 62)), y = list(cex = 1.2, rot = 45)), panel = function(y, x, ...) {
        panel.fill(col = "gray90")
        panel.grid(h = 0, v = -1, col = "black", lty = "dotted")
        panel.barchart(x, y, col = c("#FF3333", "#004C99", rep("#FF8000", 5), alpha = 40))
        panel.text(x + 1.1, y, label = round(x, 2), cex = 1.5, col = "grey")
        panel.text(x + 1, y, label = round(x, 2), cex = 1.5, col = "black")
        panel.text(-1.2, y, label = tableParty$Position, cex = 2, col = "white", pos = 4)
        panel.text(voteGb + 2, 5.78, label = paste("Votes to get majority:", 61 - voteGb), cex = 1.5, col = "grey", pos = 4)
        panel.text(voteGb + 2, 5.8, label = paste("Votes to get majority:", 61 - voteGb), cex = 1.5, col = "blue", pos = 4)
        panel.text(voteOp + 1.8, 6.78, label = paste("Votes to get majority:", 61 - voteOp), cex = 1.5, col = "grey", pos = 4)
        panel.text(voteOp + 1.8, 6.8, label = paste("Votes to get majority:", 61 - voteOp), cex = 1.5, col = "red", pos = 4)
        panel.text((voteGb - 10) - 0.01, 2.45, label = "Which party\nfrom the minority\nwill help the governing party\n rule the Congress?", cex = 3, col = "black")
        panel.text(voteGb - 10, 2.5, label = "Which party\nfrom the minority\nwill help the governing party\n rule the Congress?", cex = 3, col = "#CC6600")
        panel.arrows(voteOp + 2, 7, 61, 7, col = "red", lwd = 3, length = 0.1)
        panel.arrows(voteGb + 2, 6, 61, 6, col = "blue", lwd = 3, length = 0.1)
        panel.abline(h = 0, v = 61, col = "black", lwd = 3)
    })

plot of chunk unnamed-chunk-4

As it can be seen, now it is clear that:

The governing party is not the first power in Congress.
The Opposition party may weaken governability.
There are many alternatives for either, the governing or opposition party, to get the absolute majority.
Although the opposition seems to have conquered the Congress, in this political scenario, the governing party in the executive branch, will have more concrete “tools” to gain support from minority parties.

The colors chosen help us organise the parties according to their position. For this, we need not a particular color for every party. The use of shadows in some of the texts, help us pop-up some important information and questions to engage the reader. The use of the grid helps, although the differences are very clear. However, the line indicating the number of seats needed to achieve majority, is very important to clarify the purpose of the chart, as it makes clear where every party is located and what is missing to be in the optimal position.

IMPROVEMENT 2:

Now let's check the regional situation, which is in two separate tables:

In Peru. instead of “States”, as in the USA, we have “Regions”. The seats are won proportionally to the population in every Region.However, as Lima, the Capital of Peru, is the most populated Region, it affects the representativeness of the whole country and clearly affects the result.

To highlight this situation, we need to organise both tables into one. To precisely visualize the distribution of seats per Region and per party. We will have to do some work on both tables to get the data:

options(width = 180)
tableRegionParty$Total = NULL
for (i in (1:25)) {
    if (tableRegionParty[i, 3] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "PAP"
        tableRegionParty$groupcode[i] = 2
    }
    if (tableRegionParty[i, 4] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "UN"
        tableRegionParty$groupcode[i] = 3
    }
    if (tableRegionParty[i, 5] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "AF"
        tableRegionParty$groupcode[i] = 4
    }
    if (tableRegionParty[i, 6] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "FC"
        tableRegionParty$groupcode[i] = 5
    }
    if (tableRegionParty[i, 7] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "PP"
        tableRegionParty$groupcode[i] = 6
    }
    if (tableRegionParty[i, 8] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "RN"
        tableRegionParty$groupcode[i] = 7
    }
    if (tableRegionParty[i, 2] == max(tableRegionParty[i, 2:8])) {
        tableRegionParty$group[i] = "UPP"
        tableRegionParty$groupcode[i] = 1
    }
    if (tableRegionParty[i, 1] == "Pasco") {
        tableRegionParty$group[i] = "AF"
        tableRegionParty$groupcode[i] = 4
    }
    if (tableRegionParty[i, 1] == "Lima") {
        tableRegionParty$group[i] = "UN"
        tableRegionParty$groupcode[i] = 3
    }
    if (tableRegionParty[i, 1] == "Ancash") {
        tableRegionParty$group[i] = "PAP"
        tableRegionParty$groupcode[i] = 2
    }
}
table <- merge(tableRegionParty, tableRegion, by = "Electoral District")
for (i in (1:25)) {
    if (table$group[i] == "UPP") 
        table$percentWinner[i] = table[i, 2]/table[i, 12]
    if (table$group[i] == "UN") 
        table$percentWinner[i] = table[i, 4]/table[i, 12]
    if (table$group[i] == "RN") 
        table$percentWinner[i] = table[i, 8]/table[i, 12]
    if (table$group[i] == "PAP") 
        table$percentWinner[i] = table[i, 3]/table[i, 12]
    if (table$group[i] == "AF") 
        table$percentWinner[i] = table[i, 5]/table[i, 12]
}
table

##    Electoral District UPP PAP UN AF FC PP RN group groupcode Registered voters Seats in Congress Candidates per party Participating parties Total candidates percentWinner
## 1            Amazonas   1   1  0  0  0  0  0   UPP         1            179331                 2                    3                    17               47        0.5000
## 2              Ancash   2   2  1  0  0  0  0   PAP         2            611881                 5                    5                    21               99        0.4000
## 3            Apurímac   2   0  0  0  0  0  0   UPP         1            195954                 2                    3                    21               55        1.0000
## 4            Arequipa   3   1  1  0  0  0  0   UPP         1            770535                 5                    5                    21              101        0.6000
## 5            Ayacucho   3   0  0  0  0  0  0   UPP         1            306662                 3                    3                    20               58        1.0000
## 6           Cajamarca   2   1  1  1  0  0  0   UPP         1            721239                 5                    5                    23              109        0.4000
## 7              Callao   1   2  1  0  0  0  0   PAP         2            541730                 4                    4                    24               92        0.5000
## 8               Cusco   4   1  0  0  0  0  0   UPP         1            643629                 5                    5                    22               98        0.8000
## 9        Huancavelica   2   0  0  0  0  0  0   UPP         1            203844                 2                    3                    15               39        1.0000
## 10            Huánuco   2   1  0  0  0  0  0   UPP         1            354416                 3                    3                    22               65        0.6667
## 11                Ica   1   2  1  0  0  0  0   PAP         2            451197                 4                    5                    22               88        0.5000
## 12              Junín   2   1  1  1  0  0  0   UPP         1            701190                 5                    5                    22               99        0.4000
## 13        La Libertad   1   5  1  0  0  0  0   PAP         2            942656                 7                    7                    22              145        0.7143
## 14         Lambayeque   1   2  1  1  0  0  0   PAP         2            676735                 5                    5                    22              101        0.4000
## 15               Lima   6   7  8  8  3  2  1    UN         3           6063109                35                   35                    24              738        0.2286
## 16             Loreto   1   1  0  0  1  0  0   UPP         1            416419                 3                    3                    22               60        0.3333
## 17      Madre de Dios   0   0  0  0  0  0  1    RN         7             47742                 1                    3                    14               35        1.0000
## 18           Moquegua   1   1  0  0  0  0  0   UPP         1             99962                 2                    3                    18               44        0.5000
## 19              Pasco   1   0  0  1  0  0  0    AF         4            135670                 2                    3                    17               51        0.5000
## 20              Piura   2   3  1  0  0  0  0   PAP         2            914912                 6                    6                    23              136        0.5000
## 21               Puno   3   1  0  0  1  0  0   UPP         1            674865                 5                    5                    23              106        0.6000
## 22         San Martín   1   1  0  1  0  0  0   UPP         1            357124                 3                    3                    17               47        0.3333
## 23              Tacna   1   1  0  0  0  0  0   UPP         1            172427                 2                    3                    18               57        0.5000
## 24             Tumbes   1   1  0  0  0  0  0   UPP         1            110335                 2                    3                    19               57        0.5000
## 25            Ucayali   1   1  0  0  0  0  0   UPP         1            201342                 2                    3                    22               60        0.5000

We have merged both tables in one, however, we need to make some further work to prepare our chart. In particular, we have the seats per party in one column, and we need that the parties be a factor, so we need to convert this “wide” table into a “long” table format:

options(width = 180)
names(table)[1] = "Electoral_District"
x = table[order(-table[, 10], table[, 16], table[, 1]), ]
a = as.character(x$Electoral_District)
l <- reshape(table, varying = c("UPP", "PAP", "UN", "AF", "FC", "PP", "RN"), v.names = "Seats_Won", timevar = "Party", times = c("UPP", "PAP", "UN", "AF", "FC", "PP", "RN"), direction = "long")
l <- l[!(l$Seats_Won == 0), ]

Now we are ready to present our chart, using ggplot2:

require(ggplot2)

## Loading required package: ggplot2

cols = c("orange", "black", "blue", "yellow", "magenta", "green", "red")
congPlot = ggplot(l, aes(Electoral_District, Seats_Won, fill = Party))
congPlot = congPlot + geom_bar(position = "stack", stat = "identity") + coord_flip()
congPlot = congPlot + xlim(a)
congPlot = congPlot + ggtitle("Distribution of Seats won\n per Party by Region\n(2006 Congress Election - PERU)\n")

congPlot = congPlot + annotate("text", x = 17.95, y = 19.95, label = "DO YOU SEE \nAN STRUCTURAL PROBLEM?", color = "black", size = 5)
congPlot = congPlot + annotate("text", x = 18, y = 20, label = "DO YOU SEE \nAN STRUCTURAL PROBLEM?", color = "#696969", size = 5)
congPlot = congPlot + annotate("text", x = 12, y = 20, label = "SMALL AND LOCAL PARTIES \nGETTING\n COUNTRY WIDE\n REPRESENTATION", color = "black", size = 6)
congPlot = congPlot + annotate("text", x = 11.95, y = 19.95, label = "SMALL AND LOCAL PARTIES \nGETTING\n COUNTRY WIDE\n REPRESENTATION", color = "red", size = 6)
congPlot = congPlot + scale_fill_manual(values = cols, breaks = c("UPP", "PAP", "UN", "AF", "FC", "PP", "RN"))
congPlot + theme(axis.text.y = element_text(colour = "black", size = 10, angle = 0, hjust = 1, vjust = 0, face = "bold"), axis.text.x = element_text(colour = "grey10", size = 18, hjust = 0.5, 
    vjust = 0.5, face = "plain"))

plot of chunk unnamed-chunk-7

Our purpose here was:

To present a distribution of seats per party in every Region.
To ease the comparisson among seats assigned to Lima versus the rest of the country.
To invite the reader to reflect if the Congress in Peru is a place that represents the country.
To show, effectively, that parties with no presence in most of the country are present in congress due to their congressmen in Lima, however, the actions of any congressman affects the whole country.
To see how the opposition party is present is most of the country, however, that ma ot be enough as it is not very representative in Lima.