sum(WorldCities$population)
## [1] 2592396767
My second plausibility test is to find the city with the highest population, and see if the number seems reasonable.
max(WorldCities$population)
## [1] 14608512
14.6 million seems like a relatively plausible number for the largest city in the world.
largecities = WorldCities%>%
filter(population>100000)
nrow(largecities)
## [1] 4266
largestcities = WorldCities%>%
filter(population>1000000)
nrow(largestcities)
## [1] 350
I’m going to go ahead and say scaling by area gives a more intuitive idea of the relative popuations of each area. If diameter is linearly proportional to population, area is then quadratically proportional, which is misleading.
BiggestByCountry=
WorldCities %>%
group_by(country)%>%
filter(population==max(population))
ggplot(data=largecities,aes(x=longitude,y=latitude))+geom_point(alpha=0.5, aes(size=population))+
geom_point()+
geom_point(data=BiggestByCountry, colour="red")+
geom_text(data=BiggestByCountry, size=2, aes(label=name))
bigbigcities=
WorldCities%>%
group_by(country)%>%
filter(population>5000000)%>%
select(name,country,population)
bigbigcities
## Source: local data frame [35 x 3]
## Groups: country [25]
##
## name country population
## (chr) (chr) (dbl)
## 1 Buenos Aires AR 13076300
## 2 Dhaka BD 10356500
## 3 Sao Paulo BR 10021295
## 4 Rio de Janeiro BR 6023699
## 5 Kinshasa CD 7785965
## 6 Zhumadian CN 8263100
## 7 Tai'an CN 5499000
## 8 Shanghai CN 14608512
## 9 Nanchong CN 7150000
## 10 Beijing CN 7480601
## .. ... ... ...
The data verbs I used in this activity are as follows: arrange, filter, group_by, join, mutate, summarise, and select.