setwd("C:/Users/Seth/Documents/bandatablog/aa perCapHits")
homesPJ <- read.csv("data/homesPJ.csv", stringsAsFactors=F)
homesBB200 <- read.csv("data/homesBB200.csv", stringsAsFactors=F)
cityCount <- read.csv("data/cityCountMAN.csv", stringsAsFactors=F)
# function to format large numbers with commas
milli <- function(number) {
if (number < 1000) {
number
} else if (number > 999 & number < 1000000) {
numString <- paste(
substr(number, 1, nchar(number)-3),
substr(number,nchar(number)-2, nchar(number)),
sep=",")
numString
} else if (number > 999999 & number < 1000000000) {
numString <- paste(
substr(number, 1, nchar(number)-6),
substr(number, nchar(number)-5, nchar(number)-3),
substr(number,nchar(number)-2, nchar(number)),
sep=",")
numString
}
}
This analysis attempts to determine if any correlation can be established between the size of a metropolitan area and the number of either hit records or critically acclaimed records that it produces. It may seem obvious that bigger cities would produce more successful artists, simply because there is a bigger pool of talent, but we will see that this is not always the case. We will also examine how many hits per capita these cities produce, to attempt to correct for the aforementioned pool-of-talent issue, and instead look at whether being in a large city is, in itself, positively correlated with having a hit record.
A Shiny app, which plots this data on an interactive map and allows searches of the database by artist and location, can be found at: https://seth127.shinyapps.io/JamsPerCapita
A version of this analysis without the R code visible can also be seen at: http://rpubs.com/seth127/JamsPerCapita
The source data for this study comes from the archives on Billboard.com, and from the Pazz & Jop archives, both on Robert Christgau’s website (1974-2007) and the Village Voice website (2008-2014). Links to those pages are at the bottom of this report. The Pazz & Jop Poll is a year-end critic’s list which aggregates weighted Top 10 Albums of the Year lists from several hundred critics. It was originally conceived and curated by Robert Christgau in 1974 and has been published by the Village Voice ever since. Also, we should note that by “#1 hit” in this analysis, we mean “album (not single) that reached #1 on the Billboard 200 albums chart for at least one week.” We are using Billboard data from 1984-2014, simply because that is what is available on their website.
The location data was generated with the custom-built findHome function. The function (accessible on the github repository for this project) uses the rvest package to scrape the biographical data sidebar on the artist’s wikipedia page. It should be noted that, with a few exceptions, this provides us with the location where the artist was born for solo artists or, for bands, where the band was formed. Bear that in mind for this analysis. Despite the fact that Tim McGraw found success in Nashville and Britney Spears found it through some series of unfathomably complex, non-geographic music industry machinations–for the purposes of this study they are both represented by their birthplaces: respectively Dehli, Louisana and Kentwood, Louisiana. Though this may seem to underrepresent some large music cities, you will see that they are still plenty well represented. Also, this more accurately gets to the heart of whether or not large cities actually generate more talent, or if that talent simply flocks there once it has blossomed.
Finally, our population data was taken from the 2010 US Census. For the most part, we use the number for “metropolitan area.” The concept of what constitutes a “metropolitan area” will be discussed in some detail later. Also, it should be noted that the census data includes “micro areas” for many smaller cities where much of the population lives outside the city limits. For instance, Charlottesville, VA has less than 50,000 people living within the city, but the census lists “Charlottesville micro area” as having a population of 218,705. Where it is provided, we have used this larger number.
In certains ways, this analysis would have been more instructive if we could have used population data for a locale at the time the hit record was made, or perhaps when the artist was living there. However, for a number of reasons, this data was prohibitively hard to accurately collect. Therefore we chose to stick with the 2010 data across the board.
We begin by simply looking at the Top 10. The ten most populous metropolitan areas in America have shifted some over the past 30 years, but less than you might think. In 2010, they looked like this:
results <- cityCount[order(cityCount$pop, decreasing=T),c(9,2)]
#results[1:10,]
| City | Population |
|---|---|
| 1. New York City | 13,866,159 |
| 2. Los Angeles, CA | 12,828,837 |
| 3. Chicago, IL | 9,461,105 |
| 4. Dallas - Ft. Worth, TX | 6,426,214 |
| 5. Philadelphia, PA | 5,965,343 |
| 6. Houston, TX | 5,920,416 |
| 7. Washington, DC | 5,636,232 |
| 8. Miami, FL | 5,564,635 |
| 9. Atlanta, GA | 5,286,728 |
| 10. Boston, MA | 4,552,402 |
Now constrast that with the Top 10 cities, when ranked by number of Billboard #1 hits and Pazz & Jop selections:
results <- cityCount[order(cityCount$BB, decreasing=T),c(9,3,2)]
results <- results[results$BB>0,]
#results[1:10,]
| City | #1 hits | Population |
|---|---|---|
| 1. New York City | 79 | 13,866,159 |
| 2. Los Angeles, CA | 39 | 12,828,837 |
| 3. Chicago, IL | 19 | 9,461,105 |
| 4. Nashville, TN | 19 | 1,670,890 |
| 5. Atlanta, GA | 18 | 5,286,728 |
| 6. Detroit, MI | 16 | 4,296,250 |
| 7. Houston, TX | 12 | 5,920,416 |
| 8. Long Island, NY | 11 | 2,832,882 |
| 9. Seattle, WA | 11 | 3,439,809 |
| 10. Philadelphia, PA | 9 | 5,965,343 |
For Billboard hits, the Top 3 hold steady, however there are some notable alterations. Famous music cities like Nashville and Detroit shoot up the list, while Dallas, Philadelphia, and Washington (none of which are known as music capitals) all fall. Atlanta also makes a case for its relevance here, jumping from #9 up into the Top 5. Also notable: Long Island and Seattle sneak into the Top 10.
A quick note on Long Island: for the purposes of this analysis “Long Island” refers to the counties of Suffolk and Nassua, and does not include the burroughs of Brooklyn and Queens. While this does not, strictly speaking, constitute a city, it does have a combined population of just under 3 million people. It has also produced a fair amount of notable music. We felt including it in New York City would be unfair and counting it as a dozen different towns of under 50,000 inhabitants would also be misleading.
results <- cityCount[order(cityCount$PJ, decreasing=T),c(9,4,2)]
results <- results[results$PJ>0,]
#results[1:10,]
| City | P&J Picks | Population |
|---|---|---|
| 1. New York City | 192 | 13,866,159 |
| 2. Los Angeles, CA | 77 | 12,828,837 |
| 3. Chicago, IL | 37 | 9,461,105 |
| 4. Detroit, MI | 30 | 4,296,250 |
| 5. San Francisco, CA | 26 | 4,335,391 |
| 6. Atlanta, GA | 24 | 5,286,728 |
| 7. Minneapolis, MN | 22 | 3,348,859 |
| 8. Dallas - Ft. Worth, TX | 20 | 6,426,214 |
| 9. Athens, GA | 19 | 192,541 |
| 10. Philadelphia, PA | 17 | 5,965,343 |
Although the Top 3 stay the same, this is a surprisingly different list from both the raw population ranking and the Billboard ranking. San Francisco, though it has produced almost no Billboard #1’s over the past 30 years, storms into the Top 5 as the home of 26 critically acclaimed records. Similarly, Minneapolis–home to a storied, semi-underground 80’s and 90’s rock scene featuring Husker Du, The Replacements, The Jayhawks, and Soul Asylum, among others–makes its presence felt. Despite all this, Prince remains the only Twin Cities native to top the Billboard charts. Surprisingly Dallas - Ft. Worth, though it is not known as a hot spot for hip, critically lauded music, reclaims its spot in the Top 10.
Also of note, we see Athens, GA making a strong showing at #9. This is particularly notable because Athens has by far the smallest population in the Top 10, smaller than its closest competition (Mineapolis) by a factor of almost 20! Athens is an interesting case, and somewhat unique. In the following section, we will examine the “Hometown Hero” phenomenon, wherein a single artist literally puts a small town on the map. R.E.M. can safely be called Athens’ Hometown Heroes. However, there are six different artists calling Athens home who have albums appearing on the Pazz & Jop poll. This is pretty strong evidence for crowning Athens “The Best Music Small Town In America”. There’s even a Music of Athens, Georgia wikipedia page.
A note on New York City, before we move on: While New York City leads in every category discussed so far, it leads by an astonishing margin on the Pazz & Jop list. There is an ongoing argument in the music world that music from New York is disportionately celebrated by critics because, simply put, a lot of music critics live and work in New York. That is a fair point, and there is certainly some truth to it. That said, New York is widely considered the cultural capital of this country and arguably the Western World. This is not without reason. Critical bias aside, native New Yorkers have produced some of the most compelling music of the past half century, from Brill Building pop to Punk Rock to Hip Hop. Now let the argument continue…
Reordering the rankings based on hits or picks per capita, unsurprisingly changes things drastically. This is where the previously mentioned Hometown Hero phenomenon really comes to the forefront.
results <- cityCount[order(cityCount$BBpc, decreasing=T),c(9,5,2)]
results <- results[results$BBpc>0,]
#results[1:10,]
| City | #1 Hits per Mil | Population |
|---|---|---|
| 1. Rochester Hills, MI | 112.6839918 | 70,995 |
| 2. Grapevine, TX | 64.7472698 | 46,334 |
| 3. Aberdeen, WA | 54.9473193 | 72,797 |
| 4. Lawrence, MA | 39.2788405 | 76,377 |
| 5. Pasadena, CA | 36.4638789 | 137,122 |
| 6. Charlottesville, VA | 27.434215 | 218,705 |
| 7. Palm Desert, CA | 20.6419651 | 48,445 |
| 8. Burbank, CA | 19.3535901 | 103,340 |
| 9. Danville, KY | 18.8061835 | 53,174 |
| 10. New Brunswick, NJ | 18.1221797 | 55,181 |
results <- cityCount[order(cityCount$PJpc, decreasing=T),c(9,6,2)]
results <- results[results$PJpc>0,]
#results[1:10,]
| City | P&J Picks per Mil | Population |
|---|---|---|
| 1. Hoboken, NJ | 139.9860014 | 50,005 |
| 2. Plainfield, NJ | 100.3854802 | 49,808 |
| 3. Athens, GA | 98.6802811 | 192,541 |
| 4. Vernon, TX | 73.8825268 | 13,535 |
| 5. Forrest City, AR | 70.7764173 | 28,258 |
| 6. Rochester Hills, MI | 70.4274949 | 70,995 |
| 7. Ruston, LA | 64.1917193 | 46,735 |
| 8. Kennet, MO | 62.5919319 | 31,953 |
| 9. Palm Desert, CA | 61.9258953 | 48,445 |
| 10. Aberdeen, WA | 54.9473193 | 72,797 |
As you can see, not one of the nation’s largest cities appear in either Top 10. In fact, Charlottesville, VA is the only city with population over 200,000 on either list. Incidentally, Charlottesville is my hometown. Dave Matthews Band comes from here and I can tell you from personal experience: as far as music goes, Dave is the definition of a Hometown Hero. Of the other towns over 100,000 in population, one is Athens, GA (previously discussed) and the other two are Pasadena and Burbank, both arguably suburbs of Los Angeles.
As for the rest, they are each small towns which produced a single superstar. Rochester Hills, MI is Madonna’s hometown. Aberdeen, WA sired Kurt Cobain, while Grapevine, TX was home to Norah Jones. On the P&J list, Hoboken, NJ produced critical darlings Yo La Tengo, while tiny Vernon, TX is the legendary Roy Orbison’s birthplace. Slightly breaking the mold, humble Plainfield, NJ was home to both George Clinton and Dionne Farris, although George arguably came to his musical maturity in Detroit.
Forrest City, AR brings up an important issue that is worth discussing. A tiny town of only 30,000 people, Forrest City was the birthplace of The Right Rev. Al Green. However, Al made his mark coming out of the musical hotbed of Memphis, TN, only 45 miles down the road (and across America’s most storied and celebrated river). Is it fair to steal this credit from Memphis? How about Elvis Presley, Otis Redding, and the plethora of other great artists who grew up in small towns right across the river, or right over the border in northern Mississippi, and made their way into Memphis to stake their claim on musical history? This is especially difficult when you consider that New York, LA, and to some extent other large cities, get the benefit of their outlying provincial towns because the US census counts them as part of a metropolis. Not so for Memphis.
Atlanta and Houston are examples of cities that fall in between on this question. Both have a large metropolitan population spread over a large area, but also outlying communities such as Port Arthur, TX or Marietta, GA that have produced notable artists and arguably fall under the cultural umbrella of the larger city.
It is a grey area, and a matter of interpretation. For this analysis, I have let the US Census Bureau be the arbitrary decision maker, for two reasons. First, it is prohibitively complicated to go through every entry and manually decide whether it should be considered part of a larger cultural community. Second, in most cases, it does make sense to say that, for instance, Yonkers, NY is “part of” New York City, whereas Forrest City is not part of Memphis.
This is all very anecdotally interesting, and certainly fun to look at if you’re a music fan, but the question of whether or not there is a statistical correlation between popular music output and population remains open.
We begin to answer that question by looking at a simple table of Pearson correlations.
| Comparison | Correlation |
|---|---|
| Population vs. Billboard #1 Hits | 0.7541742 |
| Population vs. Pazz & Jop Picks | 0.7525085 |
| Population vs. Billboard #1 Hits per Capita | -0.1380792 |
| Population vs. Pazz & Jop Picks per Capita | -0.2526154 |
A quick look shows us that population does seem to be positively correlated with both #1 Hits and Pazz & Jop selections. Overall, this makes sense. The more people that live in the city, the more likely it is that some of them will write great songs and make great records.
However, in the per capita categories, we see a negative correlation. This means that, as a city gets larger, there are less musical luminaries per citizen than in a smaller city. This is slightly counterintuitive, but you can certainly make some sense of it. Perhaps there is something inspiring or artistically nurturing about living in a smaller town. Or perhaps the Hometown Heroes are dragging down the correlation disproportionally. Still, for both Billboard and Pazz & Jop, it is only a very small negative correlation. It is hard to imagine that a correlation of -.13 or even -.24 could be terribly statistically significant, no matter what the explanation.
When plotting the data, we can clearly see some large outliers.
par(mfrow=c(1,1), mar= c(5, 4, 2, 2))
plot(cityCount$PJ, cityCount$pop/1000000, xlim=c(0,80), type="n",
xlab="Number of Albums",
ylab="Population, in millions")
points(cityCount$PJ, cityCount$pop/1000000, col="blue")
points(cityCount$BB, cityCount$pop/1000000, col="red")
abline(lm(cityCount$pop/1000000 ~ cityCount$BB), col="red")
abline(lm(cityCount$pop/1000000 ~ cityCount$PJ), col= "blue")
legend("bottomright", c("Pazz & Jop", "Billboard"),
col= c("blue", "red"), pch = 1)
Most of the data points are clustered with less than 40 notable records and less than 3 or 4 million inhabitants. To look at the patterns happening in the heart of the data, we will remove the top and the bottom and see how things change. Roughly 9% of the cities in our data have a population over 4 million. Likewise, About 12.5% of our cities have a population of less than 60,000. Below, you can see the correlation table when these extremes have been removed.
mid <- cityCount[cityCount$pop<4000000 & cityCount$pop>60000,]
| Comparison | Correlation |
|---|---|
| Population vs. Billboard #1 Hits | 0.2321641 |
| Population vs. Pazz & Jop Picks | 0.3718606 |
The raw correlations are much lower now, indicating that several large cities are driving most of the correlation. This tells us that very large cities (over 4 million people, for instance) produce a lot of notable music, but perhaps once you get below a certain threshold, the size of the city matters less in determining musical output.
Our question becomes, what is that threshold? The graph below gives us some idea, though it may take a minute to understand what you are looking at.
threshold <- seq(50000, 14000000, 50000)
threshData <- data.frame(threshold = threshold,
corBB = numeric(length(threshold)),
corPJ = numeric(length(threshold)))
for (i in 1:length(threshold)) {
threshData[i,2] <- cor(cityCount[cityCount$pop<threshold[i],2],
cityCount[cityCount$pop<threshold[i],3])
threshData[i,3] <- cor(cityCount[cityCount$pop<threshold[i],2],
cityCount[cityCount$pop<threshold[i],4])
}
par(mfrow=c(1,1), mar= c(5, 4, 2, 2))
plot(threshData$threshold/1000000, threshData$corPJ, type="n",
xlab="Threshold (population, in millions)",
ylab="Correlation", yaxt="n")
axis(2, at = c(0, .25, .5, .75))
lines(threshData$threshold/1000000, threshData$corBB, col="red")
lines(threshData$threshold/1000000, threshData$corPJ, col="blue")
abline(h=0, lty=2, col="lightgray")
abline(h=.25, lty=2, col="lightgray")
abline(h=.5, lty=2, col="lightgray")
points(3.5,0.377999000, pch="*", cex=3)
legend("bottomright", c("Pazz & Jop", "Billboard"),
col= c("blue", "red"), lty = 1)
The X-axis shows the threshold. Cities above this threshold are not considered when computing the corresponding correlation (on the Y-axis). For instance, the point marked on the plot with an asterisk indicates that, when only cities with less than 3.5 million inhabitants are considered, the correlation between population and number of P&J picks (indicated by the blue line) is about .378.
You’ll note that, at very small populations, the correlation is high. This indicates that in towns of less than 50,000 or 100,000 people, a higher population produces more hit records. However, between about 100,000 and about 3,000,000 there is very low correlation.
Another way to examine this pattern is by looking at the p-values for the regression lines. Again, the X-axis shows the threshold, meaning we are only looking at cities below that population. In these plots, however, the Y-axis shows the p-value of a regression which models either Billboard or P&J data based on population, for cities below that threshold.
threshold <- seq(50000, 14000000, 50000)
pData <- data.frame(threshold = threshold,
pBB = numeric(length(threshold)),
pPJ = numeric(length(threshold)))
for (i in 1:length(threshold)) {
pData[i,2] <- summary(lm(cityCount[cityCount$pop<threshold[i],2]
~ cityCount[cityCount$pop<threshold[i],3]))$coeff[2,4]
pData[i,3] <- summary(lm(cityCount[cityCount$pop<threshold[i],2]
~ cityCount[cityCount$pop<threshold[i],4]))$coeff[2,4]
}
par(mfrow=c(1,1), mar= c(5, 4, 2, 2))
plot(pData$threshold/1000000, pData$pPJ, type="n",
xlab="Threshold (population, in millions)",
ylab="p-value")
lines(pData$threshold/1000000, pData$pPJ, col="blue")
lines(pData$threshold/1000000, pData$pBB, col="red")
abline(h=.01, lty=2, col="lightgray")
abline(h=.05, lty=2, col="lightgray")
legend("topright", c("Pazz & Jop", "Billboard"),
col= c("blue", "red"), lty = 1)
Again, we see the same pattern. When only considering smaller cities, the p-values are extremely high. This means that there is a very large probability that the correlation we are seeing could be attributed to something else (sample error, unmeasured variables, etc.) other than an actual connection between our two variables.
The plot below shows the same data, but zooms in on the point where both lines drop below a p-value of .05 (a solid p-value for establishing statistical inference). This happens at a population of around 1,750,000. However, .01 is considered a much more robust standard for confidence. It is not until the threshold reaches almost 3,500,000 that both p-values permanently drop below .01.
#zoomed plot
plot(pData$threshold/1000000, pData$pPJ, type="n",
xlim=c(1.5,4), ylim=c(0,.1),
xlab="Threshold (population, in millions)",
ylab="p-value", yaxt="n")
axis(2, at = c(.01,.05,.10))
lines(pData$threshold/1000000, pData$pPJ, col="blue")
lines(pData$threshold/1000000, pData$pBB, col="red")
abline(h=.01, lty=2, col="lightgray")
abline(h=.05, lty=2, col="lightgray")
legend("topright", c("Pazz & Jop", "Billboard"),
col= c("blue", "red"), lty = 1)
This reiterates that, for medium-sized cities there is almost no correlation between population and musical output. As a sample, look at the correlation table when we consider only cities with less than 700,000 people:
sevenh <- cityCount[cityCount$pop<700000,]
| Comparison | Correlation |
|---|---|
| Population vs. Billboard #1 Hits | 0.0354803 |
| Population vs. Pazz & Jop Picks | -0.0333662 |
For these small- to medium-sized cities, there is effectively no correlation at all between population and either hit records or critically acclaimed records.
As we move up on the X-axis, the lines level out and you can begin to see the effect that individual large cities have on the correlation. It is not until we cross the 3,500,000 threshold that we see both the Billboard and P&J correlations jump above .25 permanently, and the p-values both drop below .01. This is the point where Minneapolis and Seattle, both significant music cities, are factored into our model.
thirtyfive <- cityCount[cityCount$pop<3500000,]
| Comparison | Correlation |
|---|---|
| Population vs. Billboard #1 Hits | 0.2700723 |
| Population vs. Pazz & Jop Picks | 0.377999 |
However, .378 is still not a very convincing correlation. The Pazz & Jop correlation does not even cross the .5 mark until Boston (the 10th largest city in America) is added in at 4,600,000. More and more, it is looking like the influence of the several largest cities in America is the only thing lending any statistical significance to our correlation.
Below is the same plot we looked at earlier, but with the influence of certain notable cities labeled.
par(mfrow=c(1,1), mar= c(5, 4, 2, 2))
plot(threshData$threshold/1000000, threshData$corPJ, type="n",
xlab="Threshold (population, in millions)",
ylab="Correlation", yaxt="n",
xlim=c(0,14.74), ylim=c(0,.823))
axis(2, at = c(0, .25, .5, .75))
lines(threshData$threshold/1000000, threshData$corBB, col="red")
lines(threshData$threshold/1000000, threshData$corPJ, col="blue")
abline(h=0, lty=2, col="lightgray")
abline(h=.25, lty=2, col="lightgray")
abline(h=.5, lty=2, col="lightgray")
legend("bottomright", c("Pazz & Jop", "Billboard"),
col= c("blue", "red"), lty = 1)
#Minneapolis
points(3.35,0.317427393, pch="*", cex=2)
text(2, .33, labels="Minneapolis")
#Seattle
points(3.5,0.377999000, pch="*", cex=2)
text(2.7, .40, labels="Seattle")
#San Fran/Detroit
points(4.35,0.482784702, pch="*", cex=2)
text(3.45, .48, labels="Detroit")
text(3.15, .53, labels="San Francisco/")
#Atlanta
points(5.3,0.566413047, pch="*", cex=2)
text(4.45, .59, labels="Atlanta")
#DFW
points(6.45,0.582999223, pch="*", cex=2)
text(7.1, .62, labels="Dallas - Ft. Worth")
#CHI
points(9.5,0.665811481, pch="*", cex=2, col="darkorchid4")
text(9.4, .697, labels="Chicago")
#LA
points(12.85,0.769592971, pch="*", cex=2, col="darkorchid4")
text(12.3, .8, labels="Los Angeles")
#NYC
points(13.9,0.752508541, pch="*", cex=2, col="darkorchid4")
text(14.55, .77, labels="NYC")
The final three marks on the plot (shown in purple) represent The Big Three: Chicago, Los Angeles, and New York City. It is not until we cross the 9,500,000 threshold, adding Chicago into the model, that the Billboard correlation tops .5 and the P&J correlation jumps up to a more significant .66. Adding Los Angeles, and then New York, decisively increases both correlations.
Our final conclusion is a slightly surprising one. There are several large and influential cities which produce the bulk of this country’s most successful and critically acclaimed musicians. However, when these few immense urban centers are ignored, there is very little correlation between the population of a city and the number of either #1 Hits or critically acclaimed albums that a city produces.
This lack of correlation is particularly stark when considering only small- and medium-sized cities. A city like Syracuse, NY or Winston-Salem, NC (both about 600,000 people) is no more likely to produce a notable artist than Passaic, NJ which has about one tenth the population. Even as larger cities are considered, we don’t see strong correlations (over .7) in either category until the two undisputed musical capitals, New York and LA, are added to the model.
In conclusion, look around you. Just because you live in Podunk, Middle America doesn’t mean the next Michael Jackson isn’t living on your street.
A Shiny app which plots this data on an interactive map and allows searches of the database by artist and location, can be found at: https://seth127.shinyapps.io/JamsPerCapita
Another Shiny app which plots this data by year and looks at geographic changes over the past 30 years can be found at: https://seth127.shinyapps.io/WheresHot
The Github repository containing code for this analysis, as well as the Shiny app can be found at: https://github.com/seth127/bandatablog/blob/master/aa%20perCapHits
For Metropolitan and Micropolitan Statistical Areas: http://www.census.gov/popest/data/metro/totals/2014/index.html
For Cities and Towns (Incorporated Places and Minor Civil Divisions): http://www.census.gov/popest/data/cities/totals/2014/index.html
NOTE: For the census data, I programatically searched both of the above databases for a match and, if a match was found in each, used the one with a larger population. In most cases, this provided the data from the Metro/Micro database. The program which did this can be found in the Github repo, inside the build cityCount.R file.
The location data was generated with the custom-built findHome function, accessible on the github repository for this project. The workings of the function are addressed above, under The Data heading.