Overview and Data preparation

The project is using the free dataset from the simple map website that includes latitudes and longitudes of major cities and towns around the world in addition to other information.

# Download temp zip file to obtain data
temp <- tempfile()
download.file("https://simplemaps.com/static/data/world-cities/basic/simplemaps_worldcities_basicv1.74.zip", temp)
data <- read.csv(unz(temp, "worldcities.csv"))
unlink(temp)
head(data, 3)
     city city_ascii     lat      lng   country iso2 iso3 admin_name capital
1   Tokyo      Tokyo 35.6897 139.6922     Japan   JP  JPN    TÅ\215kyÅ\215 primary
2 Jakarta    Jakarta -6.2146 106.8451 Indonesia   ID  IDN    Jakarta primary
3   Delhi      Delhi 28.6600  77.2300     India   IN  IND      Delhi   admin
  population         id
1   37977000 1392685764
2   34540000 1360771077
3   29617000 1356872604

The dataset is too large for this assignment, to simplify the data only 5 columns are chosen: city, lat, lng, capital, population. Further on, only primary capital cities are selected.

mydf<-data[,c(1,3,4,9,10)]
mydf<-mydf[mydf$capital=='primary',]
mydf<-mydf[,c(1,2,3,5)]
head(mydf)
          city     lat      lng population
1        Tokyo 35.6897 139.6922   37977000
2      Jakarta -6.2146 106.8451   34540000
5       Manila 14.6000 120.9833   23088000
8        Seoul 37.5600 126.9900   21794000
9  Mexico City 19.4333 -99.1333   20996000
11     Beijing 39.9050 116.3914   19433000

World Capitals

The map shows the world capitals with the circle size based on the relative population

The map shows the world capitals clusters