The project is using the free dataset from the simple map website that includes latitudes and longitudes of major cities and towns around the world in addition to other information.
# Download temp zip file to obtain data
temp <- tempfile()
download.file("https://simplemaps.com/static/data/world-cities/basic/simplemaps_worldcities_basicv1.74.zip", temp)
data <- read.csv(unz(temp, "worldcities.csv"))
unlink(temp)
head(data, 3)
city city_ascii lat lng country iso2 iso3 admin_name capital
1 Tokyo Tokyo 35.6897 139.6922 Japan JP JPN TÅ\215kyÅ\215 primary
2 Jakarta Jakarta -6.2146 106.8451 Indonesia ID IDN Jakarta primary
3 Delhi Delhi 28.6600 77.2300 India IN IND Delhi admin
population id
1 37977000 1392685764
2 34540000 1360771077
3 29617000 1356872604
The dataset is too large for this assignment, to simplify the data only 5 columns are chosen: city, lat, lng, capital, population. Further on, only primary capital cities are selected.
mydf<-data[,c(1,3,4,9,10)]
mydf<-mydf[mydf$capital=='primary',]
mydf<-mydf[,c(1,2,3,5)]
head(mydf)
city lat lng population
1 Tokyo 35.6897 139.6922 37977000
2 Jakarta -6.2146 106.8451 34540000
5 Manila 14.6000 120.9833 23088000
8 Seoul 37.5600 126.9900 21794000
9 Mexico City 19.4333 -99.1333 20996000
11 Beijing 39.9050 116.3914 19433000
The map shows the world capitals with the circle size based on the relative population
The map shows the world capitals clusters