1. Assignment

OK, here's your assignment.

2. Solution

2.1 Read the data

library(foreign)
zipcoor <- read.dta("zipcoor.dta"); zipcoor

2.2 Merge the Postal Codes and Coordinates

From the reader, we repeat the code needed to read and organize the data.

sr <- read.dta("sr.dta")
zip_sr <- read.dta("zip.dta")
sr_szip <- merge(sr,zip_sr,by.x="sender",by.y="id",all.x=T,all.y=T);
library(reshape)
sr_szip <- rename(sr_szip, c(pc4="pc4s"))
sr_srzip <- merge(sr_szip,zip_sr,by.x="receiver",by.y="id",all.x=T)
sr_srzip <- rename(sr_srzip, c(pc4="pc4r"))
col_order <- c("sender","pc4s","receiver","pc4r")
sr_srzip <- sr_srzip[order(sr_srzip$sender,sr_srzip$receiver),col_order]
sr_srzip

We now have the information who sends to whom, that is, senders and receivers, along with their 4-digit postal codes.

To the postal codes, in variables pc4s and pc4r, we want to add the coordinates.

Let's do in in steps.

In step (a), we add the coordinates to the postal code of the sender.

We rename xcoor and ycoor, because in step (b) we will merge the same variables again, but then to the postal codes of the receiver.

a <- merge(sr_srzip,zipcoor,by.x="pc4s",by.y="zip",all.x=T,all.y=F)
a <- rename(a, c(xcoor="xcs",ycoor="ycs"))
a

Check for yourself that, as intended, the x and y coordinates (in xcs and ycs) are correct.

For sender 9, no postal code is known, and hence the coordinates appear as NA.

Step (b), adding the coordinates of the receiver, starts with the result of step (a).

b <- merge(a,zipcoor,by.x="pc4r",by.y="zip",all.x=T,all.y=F)
b <- rename(b, c(xcoor="xcr",ycoor="ycr"))
b

Now we have all the information we need to compute the distances between sender and receiver.

Let's first reorder the data set a bit, in our final object, sr_final.

Well, almost final, we still have to add the straight-line distance based on the Pythagorean Theorem.

col_order <- c("sender","pc4s","xcs","ycs","receiver","pc4r","xcr","ycr")
sr_final <- b[order(b$sender,b$receiver),col_order]
attach(sr_final)
sr_final$dist <- sqrt((xcs-xcr)^2 + (ycs-ycr)^2)
sr_final
detach(sr_final)