My friends Ido and Boaz visited Japan recently, and while there they posted a really cool picture to Facebook. In this picture we saw a beautiful katana blade named “Tentetsutou”, which translates roughly to “Sword of Heaven”.
You can see the sword in all its glory here.
Now, there is a very good reason it has the name it does, namely that master swordsmith Yoshindo Yoshiwara forged the thing out of fragments of the Gibeon meteorite. Which is really damned cool.
I took this Facebook post as an opportunity to practice my budding R skills, by asking myself:
Can I plot the location where the meteorite was found on a map?
Let’s find out!
Fortunately for me, NASA has a gigantic pile of public datasets available, one of which happens to be a dataset about meteorites landing on earth (includes both the ones we found, and the ones we only saw falling down). You can see the dataset here.
Now obviously, I could have simply used the filters on that page and then hit “visualize” and I would have been done. But that’s just not how I roll anymore.
Instead, we’re going to do this the best way possible; The hard way.
First, let’s go grab that tasty, tasty dataset:
setInternet2(use = TRUE) #fix for Rstudio on windows 8.1
fileurl <- "https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv?accessType=DOWNLOAD"
download.file(fileurl, destfile="meteorites.csv")
data <- read.csv("meteorites.csv", stringsAsFactors = FALSE)
Let’s have a look at what this dataset is like:
## name id nametype recclass mass..g. fall year
## 1 Aachen 1 Valid L5 21 Fell 01/01/1880 12:00:00 AM
## 2 Aarhus 2 Valid H6 720 Fell 01/01/1951 12:00:00 AM
## 3 Abee 6 Valid EH4 107000 Fell 01/01/1952 12:00:00 AM
## reclat reclong GeoLocation
## 1 50.77500 6.08333 (50.775000, 6.083330)
## 2 56.18333 10.23333 (56.183330, 10.233330)
## 3 54.21667 -113.00000 (54.216670, -113.000000)
Pretty cool stuff. We have a name, a bunch of variables pertaining to the type of rock that came screaming from the sky (on fire, how metal is that?) including it’s weight in grams and most importantly, three columns of location data (Latitude, Longitude and GeoLocation, which is both)
Unfortunately, for what I have mind we don’t really need some of these columns.
I’m not particularly interested in when they fell, or what science-y classification they have, and I am only going to be using Lat. and Long. separately so the GeoLocation field can go as well. For my amusement we’re going to keep the field that tells us if they found the damned rock.
So, let’s make a new data frame and keep only the following columns:
We’re also giving these columns new names so I don’t go nuts typing irritating things.
data2 <- data.frame(
data$name,
data$mass..g.,
data$fall,
data$reclat,
data$reclong,
stringsAsFactors = FALSE,
row.names= NULL
)
names(data2) <- c("name","mass","found","lat","long")
head(data2, n=3)
## name mass found lat long
## 1 Aachen 21 Fell 50.77500 6.08333
## 2 Aarhus 720 Fell 56.18333 10.23333
## 3 Abee 107000 Fell 54.21667 -113.00000
That’s more like it!
I don’t like that meteorites that weren’t found are called “Fell” of “Found” in our “found” column, so I’m going to replace those values with a “Yes” or a “No”.
data2$found[(data2$found == "Fell")] <- "No"
data2$found[(data2$found == "Found")] <- "Yes"
With my (completely useless) OCD satisfied we can now get to the good bit!
Ok goodie, in order to draw the map, and then place our data points onto it we will be using two packages: 1. ggplot2 - This package is what everyone uses to draw things. You should too. 2. ggmap - This package kinda does what it says on the tin, you get maps.
Let’s fire those babies up:
library(ggplot2)
library(ggmap)
Since these rocks pretty much decide all by themselves where they are going to forcefully embed themselves into the soft, supple, flesh of our planet, we’re going to grab the map of the entire world.
Note: The standard map projection is the Mercator projection. I am not a map nerd, and thus I do not have any particular preference one way or another. Apparently arguing about map projections is a bit of a “thing” amongst people far smarter than I am.
mapWorld <- borders("world", colour="gray50", fill="gray50")
mp <- ggplot() + mapWorld
Now that we have a map, we use ggplot to stick our meteorites onto it:
Ah! But dear grasshopper, you never checked to see if the coordinate data was not filthy. Shame on you for not looking into this!
We have a bunch of meteorites on file without coordinates:
sum(is.na(data2$long))
## [1] 7315
sum(is.na(data2$lat))
## [1] 7315
Fortunately, ggplot/map is generally smart enough to be able to ignore entries with missing values by adding na.rm=TRUE to the code. Unfortunately, our problems do not end there…
## Longitude Latitude
## 1 354.4733 81.16667
## 2 -165.4333 -87.36667
These are the maximum and minimum values in our Longitude and Latitude columns. For those of you who might not know how the Long/Lat coordinates system works;
specifies the east-west position of a point on the Earth’s surface. It ranges from 0° at the Prime Meridian (in Greenwich, England) to +180° eastward and -180° westward
Specifies the north-south position of a point on the Earth’s surface. Latitude is an angle which ranges from 0° at the Equator to 90° (North or South). General convention is to use positive values for north, and negative values for south.
Brandishing this new knowledge, we can see there is a bit of a problem. Apparently there are Longitude values in our dataset far in excess of the maximum value of 180.
Because I do not actually know how these values ended up in the dataset (if you’re a NASA guy reading this, please email me the reason at *helicity@machine9.net*), and I cannot plot without coordinates, I’m going to remove them, and any NA values.
We’re going to make a new dataset, and include ONLY the rows that have Long/Lat values within the allowed parameters.
data3 <- subset(data2,
lat >= -90 & lat <= 90 & long >= -180 & long <= 180
)
data3 <- subset(data3,
!(is.na(lat) | is.na(long))
)
Now when can run our drawing script, it should prove to be more cooperative:
mapPoints <- mp +
geom_point(aes(x = long, y = lat), data = data3, alpha = .5)
mapPoints
You actually get a series of warning messages if you do this, I’ve been told it’s because ggmaps doesn’t like longitude values > 90 for some reason. We’re going to ignore them.
Great, we have our points plotted, all we need to do now is merge them to our map, right? We could! But we’re not going to just yet!
I want to see which of these meteorites were actually found. So we’re going to map the colour of the points.
mapPoints <- mp +
geom_point(aes(x = long, y = lat, color=found), data = data3, alpha = .3)
mapPoints
That is a lot more like it!
But there is one thing missing now, the reason we came here to begin with. We want to see where the meteorite came from that was made into the beautiful sword Ido and Boaz saw in Japan.
Fortunately for us, Gibeon was not one of the meteorites missing the oh-so-important coordinate data.
data3[which(data3$name=="Gibeon"),1:5]
## name mass found lat long
## 12600 Gibeon 2.6e+07 Yes -25.5 18
All we need to do now, is add an additional layer to our map showing Gibeon (let’s give it a nametag while we’re at it, and “unsquish” the map)
final <- mapPoints +
geom_point(data=data3[which(data3$name=="Gibeon"),1:5],
aes(x = long, y = lat), color="black", size=5)
final <- final + geom_text(data=data3[which(data3$name=="Gibeon"),1:5],
aes(label=name, size=6, x=long, y=lat, vjust=1.6, hjust=0.9, face="bold")) +
guides(size=FALSE) +
theme(legend.position="bottom") +
ggtitle("Known meteorites, and the location where Gibeon was found") +
theme(plot.title = element_text(lineheight=.8, face="bold")) +
scale_colour_manual(values=c("#660000","#FF0000"))
final
Voila! (you can also see a bigger version here. )