Author: Luca Perer
Date: June 1, 2014
Before plotting the cluster I first standerdised my data set. Then I run hierarchical clustering on the food variables. The cluster shows us that of the 25 countries, it is easy to see that they fall into five clearly grouped categories.
dietcluster = hclust(distances, method = "ward")
plot(dietcluster, labels = diet$Country)
rect.hclust(dietcluster, k = 5)
Whith this five cluster solution we can quickly and easily assess which countries belong to which cluster.
## diet.Country.ord. diet.hcluster.ord.
## 1 Albania 1
## 2 Bulgaria 1
## 3 Romania 1
## 4 Yugoslavia 1
## 5 Austria 2
## 6 Belgium 2
## 7 France 2
## 8 Ireland 2
## 9 Netherlands 2
## 10 Switzerland 2
## 11 UK 2
## 12 WGermany 2
## 13 Czechoslovakia 3
## 14 EGermany 3
## 15 Hungary 3
## 16 Poland 3
## 17 USSR 3
## 18 Denmark 4
## 19 Finland 4
## 20 Norway 4
## 21 Sweden 4
## 22 Greece 5
## 23 Italy 5
## 24 Portugal 5
## 25 Spain 5
We can now look compare consumption of foods across all 25 countries. By looking at the means it becomes immediately apperant that there are different eating styles. We can categorize them
Segment 1: Grains = High consumption of: Cerials, and Nuts. Low consumption of: milk, fish, and starch.
Segment 2: Meat Eaters = High: Red Meat, White Meat, Eggs, Milk.
Segment 3: Potato Diet = High: Starch, White Meat, Fr.Veg. Low: Red Meat,
Segment 4: Costal Foods = High: Eggs, Milk, Fish. Low: Cereals, Nuts, Fr.Veg
Segment 5: Light & Healthy = High: Fruits & Vegetables, Fish, Nuts.
## Group.1 RedMeat WhiteMeat Eggs Milk Fish Cereals Starch Nuts Fr.Veg
## 1 1 7.125 4.675 1.200 9.45 0.750 51.12 1.950 5.050 2.975
## 2 2 13.213 10.637 3.987 21.16 3.375 24.70 4.650 2.062 4.175
## 3 3 7.920 10.040 2.840 13.84 2.740 35.74 5.560 2.540 4.260
## 4 4 9.850 7.050 3.150 26.68 8.225 22.68 4.550 1.175 2.125
## 5 5 8.125 3.800 2.475 11.20 7.625 33.67 3.975 5.675 7.075
The next step is to look at the relationship between foods to see if there are patterns. Portugal brings up the mean for both fish and Fruits & Vegetables because they are the only country that eats a lot of both fish and fruits & vegetables.
By using the qplot function I am able to create visualizations per state of; Murders, Gun Murders, Population, and Murder Rate.
Plot 1: Total Murders
Murders in the United States are very high. We can see from this viaulization where most murders occur. The darkest shades of red represent the largest number of murders per state. It is easy to see that California, Texas and Florida all have large numbers of Murders. The next step to make this information more meaningful would be to asess if these same murders were caused by guns.
Plot 2: Gun Murders
As we can see in this next Visualization, the shading has remained significantly consistent. California, Texas and Florida have many gun murders.
Plot 3: Population Size
Our information from the previous two models may be slightly skewed becuase the population size in those states are so much larger than smaller states. I confirm this with plot 3, then normalize data by creating a new variable in order to calculate murder rates.
Plot 4: Murder Rates In order to correctly retrieve the murder rate per total population I have created a new variable. This allows us to see which states actually have the highest murder rates independent of population size.