09 December, 2018

Introduction

Origins of Music

My project came from here. It is from the UCI website, and there were a several other projects from this website. The project I had chosen was about the origins of music.

Things I Did

  • Web scraped the data
  • Tidy the data just a bit
  • Made a graph of the points
  • Made some assumptions about furthering the project

Analysis

Data

With the data set I had, it had given me a grand total 118 variables and there are 1059 objects. The last two columns were the ones I was the most concern with because it is the latitude and longitude of the of the city of the country which the music is from.

Some Code

LatLong <- 
  musicOrigins %>% 
    select(V117, V118)

colnames(LatLong)[colnames(LatLong)=="V117"] <- "Latitude"
colnames(LatLong)[colnames(LatLong)=="V118"] <- "Longitude"

Here we made a new data table that took out all but the last two columns. Also, we took away the default names and gave them what they actually represent, Latitude and Longitude.

Changes Made

names(musicOrigins)[c(117, 118)] <- c("Latitude", "Longitude")

Here we just made the changes to the names in the original data set rather than making a new one with just those two columns renamed latitude and longitude.

The Graph

Plotting coordinate points

set.seed(1059)
musicClusters <- LatLong %>% 
  kmeans(centers = 6) %>% 
  fitted("class") %>% 
  as.character()
LatLong <- LatLong%>% 
  mutate(cluster = musicClusters)
LatLong %>% 
  ggplot(aes(x = Longitude, y = Latitude)) +
  geom_point(aes(color = cluster), alpha = 0.25)

There is graphing option in Chapter 9 that allows you to plot points on a graph that makes it look like a map of the world. That is were the kmeans helps divide the coordinate plane into the different regions.

Here is the Graph

Music Points on the World

Music Points on the World

Examing one of the Points

The lowest point to the left is one I had examine. The coordinates it had landed me in Brazil near the city of Sao Paulo. After some research, I found that the Samba originated there. It is a pretty cool future to take these points and then look to see where they came from.

Graph Redone

Clusters

set.seed(1059)
musicClusters <- musicOrigins %>%
  select(-Latitude, -Longitude) %>% 
  kmeans(centers = 6) %>% 
  fitted("class") %>% 
  as.character()
musicOrigins <- 
  musicOrigins %>% 
  mutate(cluster = musicClusters)

Here we made the clusters based on the music information, not only just the latitude and longitudes.

Working the Points

n <- nrow(musicOrigins)
r <- 3
set.seed(4040)
musicOrigins2 <- 
  musicOrigins %>% 
  mutate(lat = Latitude + runif(n, min = -r, max = r),
         long = Longitude + runif(n, min = -r, max = r))

Now the points can be plotted together near their origins and are jittered for our convience to see them.

Graphs

WorldData <- map_data('world')
WorldData %>% filter(region != "Antarctica") -> WorldData
WorldData <- fortify(WorldData)

musicOrigins2 %>% 
  select(lat, long, cluster) %>% 
  ggplot(aes(x = long, y = lat)) +
    geom_map(data=WorldData, map=WorldData,
             aes(group=group, map_id=region),
             fill="white", colour="#7f7f7f", size=0.5) +
    geom_point(aes(color = cluster))

Here we made a map that gives our regions excluding Antartica. Compared to the graph I got a first, this one makes more sense and more pleasing to the eye.

New Graph

Music Origin Points

Music Origin Points

Conclusion

Wrapping it up

This was a fun study to learn some new methods of graphing data sets that can pinpoint things around the world. It would be interesting working with other data sets that use the same features that way we could work more with it.

Further Work

There are a couple of things that I would like to expand upon in the future:

  • Understand more about the other columns and learn what those frequencies mean.
  • Making a new column for the data set that gives the name of the country of origin
  • With those names make some random forrest or linear models to make predictions
  • Lastly find out if this relates to music software that predicts song names and their artist

Wrapping up the New Stuff

With this new graph we can focus in with a region. From that region of the world, we can draw to conclusions the types of music played in these areas. Also, since some music styles derive from other styles, we could also predict those because they have that same style in common.