09 December, 2018
My project came from here. It is from the UCI website, and there were a several other projects from this website. The project I had chosen was about the origins of music.
With the data set I had, it had given me a grand total 118 variables and there are 1059 objects. The last two columns were the ones I was the most concern with because it is the latitude and longitude of the of the city of the country which the music is from.
LatLong <-
musicOrigins %>%
select(V117, V118)
colnames(LatLong)[colnames(LatLong)=="V117"] <- "Latitude"
colnames(LatLong)[colnames(LatLong)=="V118"] <- "Longitude"
Here we made a new data table that took out all but the last two columns. Also, we took away the default names and gave them what they actually represent, Latitude and Longitude.
names(musicOrigins)[c(117, 118)] <- c("Latitude", "Longitude")
Here we just made the changes to the names in the original data set rather than making a new one with just those two columns renamed latitude and longitude.
set.seed(1059)
musicClusters <- LatLong %>%
kmeans(centers = 6) %>%
fitted("class") %>%
as.character()
LatLong <- LatLong%>%
mutate(cluster = musicClusters)
LatLong %>%
ggplot(aes(x = Longitude, y = Latitude)) +
geom_point(aes(color = cluster), alpha = 0.25)
There is graphing option in Chapter 9 that allows you to plot points on a graph that makes it look like a map of the world. That is were the kmeans helps divide the coordinate plane into the different regions.
Music Points on the World
The lowest point to the left is one I had examine. The coordinates it had landed me in Brazil near the city of Sao Paulo. After some research, I found that the Samba originated there. It is a pretty cool future to take these points and then look to see where they came from.
set.seed(1059)
musicClusters <- musicOrigins %>%
select(-Latitude, -Longitude) %>%
kmeans(centers = 6) %>%
fitted("class") %>%
as.character()
musicOrigins <-
musicOrigins %>%
mutate(cluster = musicClusters)
Here we made the clusters based on the music information, not only just the latitude and longitudes.
n <- nrow(musicOrigins)
r <- 3
set.seed(4040)
musicOrigins2 <-
musicOrigins %>%
mutate(lat = Latitude + runif(n, min = -r, max = r),
long = Longitude + runif(n, min = -r, max = r))
Now the points can be plotted together near their origins and are jittered for our convience to see them.
WorldData <- map_data('world')
WorldData %>% filter(region != "Antarctica") -> WorldData
WorldData <- fortify(WorldData)
musicOrigins2 %>%
select(lat, long, cluster) %>%
ggplot(aes(x = long, y = lat)) +
geom_map(data=WorldData, map=WorldData,
aes(group=group, map_id=region),
fill="white", colour="#7f7f7f", size=0.5) +
geom_point(aes(color = cluster))
Here we made a map that gives our regions excluding Antartica. Compared to the graph I got a first, this one makes more sense and more pleasing to the eye.
Music Origin Points
This was a fun study to learn some new methods of graphing data sets that can pinpoint things around the world. It would be interesting working with other data sets that use the same features that way we could work more with it.
There are a couple of things that I would like to expand upon in the future:
With this new graph we can focus in with a region. From that region of the world, we can draw to conclusions the types of music played in these areas. Also, since some music styles derive from other styles, we could also predict those because they have that same style in common.