I obtained data on death through interpersonal violence in 2002 and 2004 and total population for 2004 from Gapminder. I then loaded in the cases that did not have missing data.
The data came from two separate spread sheets downloaded from Gapminder, the population and interpersonal violence spreadsheets. The population spreadsheet had data for over 200 years and, since it included countries that have since disappeared, had a lot more entries than the violence data. I merged the data by hand in Mac Numbers, which may have been a mistake, since Numbers does not seem to have an option for deleting individual cells.
The assignment is to create 2-5 plots that make use of the techniques from Lesson 3. These include simple histograms, boxplots split over a categorical variable, or frequency polygons. The assignment also includes the task of saving the pictures created and posting them to the dicussion board as well as submitting my code on the Udacity website.
I will begin with a histogram of worldwide murders in 2004.
The first thing I realize is that I have not follow the naming conventions from the Fox and Weisberg text of capitalizing data sets and keeping variable names in small case. Indeed, I seem to have done just the opposite.
#install.packages("knitr")
#install.packages("ggplot2")
library(knitr)
library(ggplot2)
worldHomicide <- read.csv("/Users/michaelreinhard/Google Drive/R/worldHomicide.csv")
head(worldHomicide)
## Country population Death_02 Death_04 deathRate04 dper100k
## 1 Afghanistan 26693486 916 813 3.046e-05 3.046
## 2 Albania 3124861 187 208 6.656e-05 6.656
## 3 Algeria 32396048 3745 3102 9.575e-05 9.575
## 4 Andorra 75292 1 1 1.328e-05 1.328
## 5 Angola 15957460 5217 6226 3.902e-04 39.016
## 6 Antigua and Barbuda 82838 7 6 7.243e-05 7.243
WrdMurder <- as.data.frame(worldHomicide) #this seems to be sufficient without adding the table() command inbetween.
head(WrdMurder)
## Country population Death_02 Death_04 deathRate04 dper100k
## 1 Afghanistan 26693486 916 813 3.046e-05 3.046
## 2 Albania 3124861 187 208 6.656e-05 6.656
## 3 Algeria 32396048 3745 3102 9.575e-05 9.575
## 4 Andorra 75292 1 1 1.328e-05 1.328
## 5 Angola 15957460 5217 6226 3.902e-04 39.016
## 6 Antigua and Barbuda 82838 7 6 7.243e-05 7.243
#Now make variable names lower case
names(WrdMurder) <- tolower(names(WrdMurder))
names(WrdMurder)
## [1] "country" "population" "death_02" "death_04" "deathrate04"
## [6] "dper100k"
names(WrdMurder) <- c("nat","pop","death02","death04","dRate04","d100k")
head(WrdMurder)
## nat pop death02 death04 dRate04 d100k
## 1 Afghanistan 26693486 916 813 3.046e-05 3.046
## 2 Albania 3124861 187 208 6.656e-05 6.656
## 3 Algeria 32396048 3745 3102 9.575e-05 9.575
## 4 Andorra 75292 1 1 1.328e-05 1.328
## 5 Angola 15957460 5217 6226 3.902e-04 39.016
## 6 Antigua and Barbuda 82838 7 6 7.243e-05 7.243
names(WrdMurder)
## [1] "nat" "pop" "death02" "death04" "dRate04" "d100k"
Now I am going to attach the data set to save a lot of typing. I also take a look to see how many cases there are after missing data is eliminated.
WrdMurderNA <- na.omit(WrdMurder)
head(WrdMurderNA)
## nat pop death02 death04 dRate04 d100k
## 1 Afghanistan 26693486 916 813 3.046e-05 3.046
## 2 Albania 3124861 187 208 6.656e-05 6.656
## 3 Algeria 32396048 3745 3102 9.575e-05 9.575
## 4 Andorra 75292 1 1 1.328e-05 1.328
## 5 Angola 15957460 5217 6226 3.902e-04 39.016
## 6 Antigua and Barbuda 82838 7 6 7.243e-05 7.243
summary(WrdMurderNA)
## nat pop death02
## Afghanistan : 1 Min. :1.73e+03 Min. : 0
## Albania : 1 1st Qu.:1.35e+06 1st Qu.: 57
## Algeria : 1 Median :6.47e+06 Median : 345
## Andorra : 1 Mean :3.34e+07 Mean : 2921
## Angola : 1 3rd Qu.:2.15e+07 3rd Qu.: 1618
## Antigua and Barbuda: 1 Max. :1.30e+09 Max. :57516
## (Other) :185
## death04 dRate04 d100k
## Min. : 0 Min. :0.00e+00 Min. : 0.00
## 1st Qu.: 63 1st Qu.:1.92e-05 1st Qu.: 1.92
## Median : 328 Median :6.59e-05 Median : 6.59
## Mean : 3129 Mean :1.08e-04 Mean :10.76
## 3rd Qu.: 2062 3rd Qu.:1.63e-04 3rd Qu.:16.27
## Max. :61229 Max. :8.64e-04 Max. :86.39
##
str(WrdMurderNA)
## 'data.frame': 191 obs. of 6 variables:
## $ nat : Factor w/ 195 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ pop : int 26693486 3124861 32396048 75292 15957460 82838 38340778 3062612 20103822 8185553 ...
## $ death02: int 916 187 3745 1 5217 7 3329 112 284 75 ...
## $ death04: int 813 208 3102 1 6226 6 2596 100 253 63 ...
## $ dRate04: num 3.05e-05 6.66e-05 9.58e-05 1.33e-05 3.90e-04 ...
## $ d100k : num 3.05 6.66 9.58 1.33 39.02 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:4] 25 85 153 191
## .. ..- attr(*, "names")= chr [1:4] "25" "85" "153" "191"
hist(WrdMurderNA$pop)
hist(WrdMurderNA$d100k)
The first thing we see is that the distribution seems to be very skewed to the right.
Now I switch over to ggplot.
p <- ggplot(data=WrdMurderNA, aes(x=d100k))
p + geom_bar()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
p + geom_histogram(binwidth = 1)
p + geom_density()
WrdMurderNA_large <- subset(WrdMurderNA, pop >= 1000000)
I have imported the data. Now I want to inspect the basic properties of the data set. According to the data set there are 191 cases in the NA purged data set and 149.
summary(WrdMurderNA$pop)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.73e+03 1.35e+06 6.47e+06 3.34e+07 2.15e+07 1.30e+09
class(WrdMurderNA$pop)
## [1] "integer"
summary(WrdMurderNA$death04)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 63 328 3130 2060 61200
This is a good illustration of how right skewed data results in the mean being higher than the median.
summary(WrdMurderNA$d100k)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.92 6.59 10.80 16.30 86.40
WrdMurderNA[186,]
## nat pop death02 death04 dRate04 d100k
## 189 Vanuatu 205561 3 2 9.729e-06 0.9729
WrdMurderNA[185,]
## nat pop death02 death04 dRate04 d100k
## 188 Uzbekistan 25708188 951 921 3.583e-05 3.583
WrdMurderNA[183,]
## nat pop death02 death04 dRate04 d100k
## 186 United States 294063120 15726 17647 6.001e-05 6.001
#WrdMurderNA$nat["Afghanistan"]
#WrdMurderNA[nat=="Afghanistan",]
#WrdMurderNA[nat=="United States",]
#WrdMurderNA[nat=="United Kingdom",]
I want to do a few things with this data set.
First I create a simple histogram of the world’s homicides. I adjust the bin width to see if there is an optimal trade-off between grainularity and detecting an overall pattern. Then I create a single boxplot to identify the outliers. I also create a violin plot.
pMurder <- ggplot(aes(x = death04), data = WrdMurderNA)
pMurder + geom_histogram(binwidth = 100)
pMurder + geom_histogram(binwidth = 1, fill = "steelblue") + xlim(0,50)
pMurder + geom_histogram(binwidth = 1000, fill = "steelblue") + xlim(5000,70000)
Next, I look at the population varible to see how population size is distributed among the nations of the world. I first create a general histogram and experiment with a few different bin sizes.
pPop <- ggplot(aes(x = pop), data = WrdMurderNA)
pPop + geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
#pPop + geom_histogram(binwidth = range(WrdMurder)pop/10) # doesn't work
pPop + geom_density(fill = "steelblue")
ggsave('population_density.jpg')
## Saving 7 x 5 in image
Next, I attempt to fit a power distribution to the histogram to get a sense of the most appropriate model for the distribution of population.
Another thing that would nice would be to make murders per capita into an ordered factor and use that as the colors in a uniform size bar chart showing the proportions of countries at each level of population that are high, medium or low murder countries. This is done with the cut function:
WrdMurderNA$d100k5cat <- cut(WrdMurderNA$d100k,quantile(WrdMurderNA$d100k, probs=seq(0,1,0.2)))
WrdMurderNA$death04cat5 <- cut(WrdMurderNA$death04,quantile(WrdMurderNA$death04, probs=seq(0,1,0.2)))
summary(WrdMurderNA$death04cat5) #still gives missing values
## (0,38] (38,208] (208,680]
## 33 38 38
## (680,2.56e+03] (2.56e+03,6.12e+04] NA's
## 38 38 6
WrdMurderNA$d100k5cat
## [1] (1.4,3.4] (3.4,9.91] (3.4,9.91] (0,1.4] (18.6,86.4]
## [6] (3.4,9.91] (3.4,9.91] (1.4,3.4] (0,1.4] (0,1.4]
## [11] (1.4,3.4] (18.6,86.4] (0,1.4] (3.4,9.91] (9.91,18.6]
## [16] (9.91,18.6] (1.4,3.4] (18.6,86.4] (9.91,18.6] (3.4,9.91]
## [21] (3.4,9.91] (1.4,3.4] (18.6,86.4] (18.6,86.4] (0,1.4]
## [26] (1.4,3.4] (9.91,18.6] (18.6,86.4] (18.6,86.4] (9.91,18.6]
## [31] (0,1.4] (9.91,18.6] (18.6,86.4] (18.6,86.4] (3.4,9.91]
## [36] (1.4,3.4] (18.6,86.4] (9.91,18.6] (18.6,86.4] (18.6,86.4]
## [41] <NA> (3.4,9.91] (18.6,86.4] (1.4,3.4] (3.4,9.91]
## [46] (0,1.4] (0,1.4] (0,1.4] (1.4,3.4] (9.91,18.6]
## [51] (9.91,18.6] (18.6,86.4] (0,1.4] (18.6,86.4] (18.6,86.4]
## [56] (9.91,18.6] (3.4,9.91] (18.6,86.4] (0,1.4] (1.4,3.4]
## [61] (0,1.4] (9.91,18.6] (9.91,18.6] (3.4,9.91] (0,1.4]
## [66] (9.91,18.6] (0,1.4] (3.4,9.91] (18.6,86.4] (9.91,18.6]
## [71] (18.6,86.4] (18.6,86.4] (3.4,9.91] (18.6,86.4] (1.4,3.4]
## [76] (0,1.4] (3.4,9.91] (3.4,9.91] (1.4,3.4] (3.4,9.91]
## [81] (0,1.4] (3.4,9.91] (0,1.4] (0,1.4] (3.4,9.91]
## [86] (9.91,18.6] (18.6,86.4] (3.4,9.91] (18.6,86.4] (1.4,3.4]
## [91] (1.4,3.4] (3.4,9.91] (3.4,9.91] (9.91,18.6] (1.4,3.4]
## [96] (9.91,18.6] (9.91,18.6] (1.4,3.4] (3.4,9.91] (0,1.4]
## [101] (3.4,9.91] (9.91,18.6] (9.91,18.6] (3.4,9.91] (1.4,3.4]
## [106] (9.91,18.6] (0,1.4] (1.4,3.4] (9.91,18.6] (1.4,3.4]
## [111] (3.4,9.91] (0,1.4] (3.4,9.91] <NA> (1.4,3.4]
## [116] (0,1.4] (18.6,86.4] (9.91,18.6] (9.91,18.6] (3.4,9.91]
## [121] (9.91,18.6] (0,1.4] (0,1.4] (9.91,18.6] (18.6,86.4]
## [126] (9.91,18.6] <NA> (0,1.4] (1.4,3.4] (3.4,9.91]
## [131] <NA> (9.91,18.6] (9.91,18.6] (9.91,18.6] (1.4,3.4]
## [136] (18.6,86.4] (1.4,3.4] (1.4,3.4] (0,1.4] (1.4,3.4]
## [141] (18.6,86.4] (18.6,86.4] (9.91,18.6] (18.6,86.4] (9.91,18.6]
## [146] (0,1.4] <NA> (3.4,9.91] (1.4,3.4] (9.91,18.6]
## [151] (1.4,3.4] (3.4,9.91] (18.6,86.4] (0,1.4] (1.4,3.4]
## [156] (1.4,3.4] (1.4,3.4] (1.4,3.4] (18.6,86.4] (1.4,3.4]
## [161] (3.4,9.91] (18.6,86.4] (9.91,18.6] (18.6,86.4] (0,1.4]
## [166] (0,1.4] (1.4,3.4] (1.4,3.4] (18.6,86.4] (3.4,9.91]
## [171] (9.91,18.6] (9.91,18.6] (0,1.4] (9.91,18.6] (1.4,3.4]
## [176] (1.4,3.4] (3.4,9.91] <NA> (18.6,86.4] (9.91,18.6]
## [181] (0,1.4] (1.4,3.4] (3.4,9.91] (3.4,9.91] (3.4,9.91]
## [186] (0,1.4] (18.6,86.4] (3.4,9.91] (1.4,3.4] (18.6,86.4]
## [191] (18.6,86.4]
## Levels: (0,1.4] (1.4,3.4] (3.4,9.91] (9.91,18.6] (18.6,86.4]
summary(WrdMurderNA)
## nat pop death02
## Afghanistan : 1 Min. :1.73e+03 Min. : 0
## Albania : 1 1st Qu.:1.35e+06 1st Qu.: 57
## Algeria : 1 Median :6.47e+06 Median : 345
## Andorra : 1 Mean :3.34e+07 Mean : 2921
## Angola : 1 3rd Qu.:2.15e+07 3rd Qu.: 1618
## Antigua and Barbuda: 1 Max. :1.30e+09 Max. :57516
## (Other) :185
## death04 dRate04 d100k d100k5cat
## Min. : 0 Min. :0.00e+00 Min. : 0.00 (0,1.4] :33
## 1st Qu.: 63 1st Qu.:1.92e-05 1st Qu.: 1.92 (1.4,3.4] :38
## Median : 328 Median :6.59e-05 Median : 6.59 (3.4,9.91] :38
## Mean : 3129 Mean :1.08e-04 Mean :10.76 (9.91,18.6]:38
## 3rd Qu.: 2062 3rd Qu.:1.63e-04 3rd Qu.:16.27 (18.6,86.4]:38
## Max. :61229 Max. :8.64e-04 Max. :86.39 NA's : 6
##
## death04cat5
## (0,38] :33
## (38,208] :38
## (208,680] :38
## (680,2.56e+03] :38
## (2.56e+03,6.12e+04]:38
## NA's : 6
##
WrdMurderNAnone <- WrdMurderNA
summary(WrdMurderNAnone)
## nat pop death02
## Afghanistan : 1 Min. :1.73e+03 Min. : 0
## Albania : 1 1st Qu.:1.35e+06 1st Qu.: 57
## Algeria : 1 Median :6.47e+06 Median : 345
## Andorra : 1 Mean :3.34e+07 Mean : 2921
## Angola : 1 3rd Qu.:2.15e+07 3rd Qu.: 1618
## Antigua and Barbuda: 1 Max. :1.30e+09 Max. :57516
## (Other) :185
## death04 dRate04 d100k d100k5cat
## Min. : 0 Min. :0.00e+00 Min. : 0.00 (0,1.4] :33
## 1st Qu.: 63 1st Qu.:1.92e-05 1st Qu.: 1.92 (1.4,3.4] :38
## Median : 328 Median :6.59e-05 Median : 6.59 (3.4,9.91] :38
## Mean : 3129 Mean :1.08e-04 Mean :10.76 (9.91,18.6]:38
## 3rd Qu.: 2062 3rd Qu.:1.63e-04 3rd Qu.:16.27 (18.6,86.4]:38
## Max. :61229 Max. :8.64e-04 Max. :86.39 NA's : 6
##
## death04cat5
## (0,38] :33
## (38,208] :38
## (208,680] :38
## (680,2.56e+03] :38
## (2.56e+03,6.12e+04]:38
## NA's : 6
##
WrdMurderNAnone <- subset(WrdMurderNA, is.na(death04cat5==FALSE))
summary(WrdMurderNAnone)
## nat pop death02 death04 dRate04
## Cook Is :1 Min. : 1728 Min. :0 Min. :0 Min. :0
## Monaco :1 1st Qu.:12003 1st Qu.:0 1st Qu.:0 1st Qu.:0
## Niue :1 Median :19439 Median :0 Median :0 Median :0
## Palau :1 Mean :19209 Mean :0 Mean :0 Mean :0
## San Marino:1 3rd Qu.:27242 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0
## Tuvalu :1 Max. :35282 Max. :0 Max. :0 Max. :0
## (Other) :0
## d100k d100k5cat death04cat5
## Min. :0 (0,1.4] :0 (0,38] :0
## 1st Qu.:0 (1.4,3.4] :0 (38,208] :0
## Median :0 (3.4,9.91] :0 (208,680] :0
## Mean :0 (9.91,18.6]:0 (680,2.56e+03] :0
## 3rd Qu.:0 (18.6,86.4]:0 (2.56e+03,6.12e+04]:0
## Max. :0 NA's :6 NA's :6
##
#death04nrm <- death04/sum(death04)
#death04nrm5cat <- cut(death04nrm, quantile(death04nrm, probs=seq(0,1,0.2)))
#WrdMurder$d100knrmQuintiles100 <- cut(WrdMurder$d100knrm,quantile(WrdMurder$d100knrm, probs=seq(0,1,0.2)))
Now I will make the histogram of population with murder rates as the color. Since the population variable is so spread out I will make the x scale logarithmic.
{r popMurder} pPop <- ggplot(aes(x = pop), data = WrdMurderNA) pPop + geom_histogram(aes(fill = d100k5cat)) + scale_color_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40"))+ scale_fill_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40")) + scale_x_log10() pPop + geom_density(aes(fill = dPercentiles5)) + scale_color_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40"))+ scale_fill_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40")) + scale_x_log10() pPop + geom_bar(aes(fill = dPercentiles5, position = "stack")) + scale_color_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40"))+ scale_fill_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40")) + scale_x_log10() pPop + geom_bar(aes(fill = death04nrm5cat, position = "stack")) + scale_color_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40"))+ scale_fill_manual(values = c("grey80", "grey70", "grey60", "grey50", "grey40")) + scale_x_log10() #popNorm <- as.numeric(pop)/sum(as.numeric(pop)) popNorm sum(popNorm) class(popNorm) WrdMurder$popNorm <- popNorm nrow(WrdMurder)
pPopNorm <- ggplot(aes(x = popNorm, data = WrdMurder)) pPopNorm + geom_bar(aes(fill = death04nrm5cat, position = “stack”)) + scale_color_manual(values = c(“grey80”, “grey70”, “grey60”, “grey50”, “grey40”))+ scale_fill_manual(values = c(“grey80”, “grey70”, “grey60”, “grey50”, “grey40”)) + scale_x_log10() pop_dist <- ggplot(aes(x = pop), data = na.omit(WrdMurderNA)) pop_dist + geom_histogram(aes(fill = d100k5cat),binwidth = 0.1, position = “fill”) + scale_x_log10()
pop_dist + geom_histogram(aes(fill = d100k5cat),binwidth = 0.1, position = “fill”) + scale_color_manual(values = c(“grey80”, “grey70”, “grey60”, “grey50”, “grey40”))+ scale_fill_manual(values = c(“grey80”, “grey70”, “grey60”, “grey50”, “grey40”)) + scale_x_log10()
```
plot <- ggplot(aes(x = pop), data = na.omit(WrdMurderNA))
plot + geom_histogram(aes(y = ..density..)) + facet_grid(.~d100k5cat) # this makes the facets side by side.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
plot + geom_histogram(aes(y = ..density..)) +
facet_grid(d100k5cat~.) +
scale_x_log10()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
plot + geom_histogram(aes(fill = d100k5cat), position = "fill") +
scale_x_log10()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
plot + geom_histogram(aes(y = ..density.., color = d100k5cat, fill = d100k5cat)) +
scale_x_log10() # this is the stacked effect I was looking for and have now found by accident.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
plot + geom_freqpoly(aes(y = ..density.., color = d100k5cat)) +
scale_x_log10()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
``{r} sum(d100k) d100knrm <- d100k/sum(d100k)
Ok, what I really want here is a big rectangle with the different bands of color. What is that called?
Next, I focus on the extremes of the distribution. I use the xlim() function to find the ten largest and ten smallest countries in my data.
I then go back to an earlier plot and use the labeling capabilities of R to mark them out on the larger graph.
I then compare murder to population to see if there is a pattern connecting the size of a country and the number of murders it has.
I obtain the pearson correlation statistic to see if there is a relation. To the extent there is a strong relation between population and number of murders we can infer that small countries are inherently no safer than large countries. On the other hand, if large countries are systematically safer then we can speculate that Collier’s thesis, that the provision of security operates under significant economies of scale, has some support in these data.
Next, I show the countries in order of their size on the x axis and their murders on the y axis as a histogram first.
And now as a scatter plot.
Now I add lines showing the median, 25th and 75th percentiles. (Does this even make sense?)
Now I calculate the murder rate by dividing the raw number of murders by the country’s population and display the results in a histogram.
To make the results more interpretable I employ some transformations of the data. First, I simply multiply by 100k to get the murder rate per 100,000 and display the results in a histogram.
Now, I do the same thing by adjusting the scale.
Now I employ some statistical transformations to make the distribution more normal. First I employ the log transformation.
Next, I try a square-root function.
Now, I add the last layer of the plot to find the intersting cases and outliers. There are two kinds of outliers, those that have unusually large or small values of the dependent variable in their own right and those that have an unusual value on the dependent variable in terms of their relationship to population. As I identify these outliers I will
Inspect the structure of the data. ``{r} #str(Homicide)
Why does it treat population as a factor? I decide to change it to an integer, since you can't have less than a person.
``{r}
hist(Homicide$04, breaks = "Sturges")
Turns out that you can’t use a number as a variable name because it gets interpreted as a numeric constant so I am changing it back to death! {r} #names(Homicide) <- c("country","pop","death02","death04") #head(Homicide) hist(Homicide$death04, breaks = c(0,70000,5000)) #Why has this become a density plot? #Homicide$pop #rm(Homicide$pop) ``` Now make a per capita variable{r} Homicide\(perCap04 <- Homicide\)death04/Homicide$popNum head(Homicide)