Hierarchical Clustering and Map Visualization

Author: Luca Perer
Date: June 1, 2014

PART 1

1) Clustering Countries by Food Type [Steps 1-6]

Before plotting the cluster I first standerdised my data set. Then I run hierarchical clustering on the food variables. The cluster shows us that of the 25 countries, it is easy to see that they fall into five clearly grouped categories.

dietcluster = hclust(distances, method = "ward")
plot(dietcluster, labels = diet$Country)
rect.hclust(dietcluster, k = 5)

plot of chunk unnamed-chunk-2

Whith this five cluster solution we can quickly and easily assess which countries belong to which cluster.

##    diet.Country.ord. diet.hcluster.ord.
## 1            Albania                  1
## 2           Bulgaria                  1
## 3            Romania                  1
## 4         Yugoslavia                  1
## 5            Austria                  2
## 6            Belgium                  2
## 7             France                  2
## 8            Ireland                  2
## 9        Netherlands                  2
## 10       Switzerland                  2
## 11                UK                  2
## 12          WGermany                  2
## 13    Czechoslovakia                  3
## 14          EGermany                  3
## 15           Hungary                  3
## 16            Poland                  3
## 17              USSR                  3
## 18           Denmark                  4
## 19           Finland                  4
## 20            Norway                  4
## 21            Sweden                  4
## 22            Greece                  5
## 23             Italy                  5
## 24          Portugal                  5
## 25             Spain                  5

2) Ploting Average Consumption [Steps 7]

We can now look compare consumption of foods across all 25 countries. By looking at the means it becomes immediately apperant that there are different eating styles. We can categorize them

Segment 1: Grains = High consumption of: Cerials, and Nuts. Low consumption of: milk, fish, and starch.

Segment 2: Meat Eaters = High: Red Meat, White Meat, Eggs, Milk.

Segment 3: Potato Diet = High: Starch, White Meat, Fr.Veg. Low: Red Meat,

Segment 4: Costal Foods = High: Eggs, Milk, Fish. Low: Cereals, Nuts, Fr.Veg

Segment 5: Light & Healthy = High: Fruits & Vegetables, Fish, Nuts.

##   Group.1 RedMeat WhiteMeat  Eggs  Milk  Fish Cereals Starch  Nuts Fr.Veg
## 1       1   7.125     4.675 1.200  9.45 0.750   51.12  1.950 5.050  2.975
## 2       2  13.213    10.637 3.987 21.16 3.375   24.70  4.650 2.062  4.175
## 3       3   7.920    10.040 2.840 13.84 2.740   35.74  5.560 2.540  4.260
## 4       4   9.850     7.050 3.150 26.68 8.225   22.68  4.550 1.175  2.125
## 5       5   8.125     3.800 2.475 11.20 7.625   33.67  3.975 5.675  7.075

3) Visualizing Consumption of Foods [Steps 8-9]

The next step is to look at the relationship between foods to see if there are patterns. Portugal brings up the mean for both fish and Fruits & Vegetables because they are the only country that eats a lot of both fish and fruits & vegetables.

plot of chunk unnamed-chunk-5 plot of chunk unnamed-chunk-5

PART 2: Map Visualization [Steps 1-6]

By using the qplot function I am able to create visualizations per state of; Murders, Gun Murders, Population, and Murder Rate.

Plot 1: Total Murders
Murders in the United States are very high. We can see from this viaulization where most murders occur. The darkest shades of red represent the largest number of murders per state. It is easy to see that California, Texas and Florida all have large numbers of Murders. The next step to make this information more meaningful would be to asess if these same murders were caused by guns. plot of chunk unnamed-chunk-9

Plot 2: Gun Murders
As we can see in this next Visualization, the shading has remained significantly consistent. California, Texas and Florida have many gun murders.
plot of chunk unnamed-chunk-10

Plot 3: Population Size
Our information from the previous two models may be slightly skewed becuase the population size in those states are so much larger than smaller states. I confirm this with plot 3, then normalize data by creating a new variable in order to calculate murder rates.

plot of chunk unnamed-chunk-11

Plot 4: Murder Rates In order to correctly retrieve the murder rate per total population I have created a new variable. This allows us to see which states actually have the highest murder rates independent of population size.

plot of chunk unnamed-chunk-12