⇒ WorldCities.Rmd

  1. The datatable seems very plausible. When the population of each city is summed, we get approx. 2.6 billion, which seems like a relatively trustworthy figure for the top 23,000 cities.
sum(WorldCities$population)
## [1] 2592396767

My second plausibility test is to find the city with the highest population, and see if the number seems reasonable.

max(WorldCities$population)
## [1] 14608512

14.6 million seems like a relatively plausible number for the largest city in the world.

largecities = WorldCities%>%
  filter(population>100000)
nrow(largecities)
## [1] 4266
largestcities = WorldCities%>%
  filter(population>1000000)
nrow(largestcities)
## [1] 350

I’m going to go ahead and say scaling by area gives a more intuitive idea of the relative popuations of each area. If diameter is linearly proportional to population, area is then quadratically proportional, which is misleading.

BiggestByCountry=
  WorldCities %>%
  group_by(country)%>%
  filter(population==max(population))
ggplot(data=largecities,aes(x=longitude,y=latitude))+geom_point(alpha=0.5, aes(size=population))+
  geom_point()+
  geom_point(data=BiggestByCountry, colour="red")+
  geom_text(data=BiggestByCountry, size=2, aes(label=name))

bigbigcities=
  WorldCities%>%
  group_by(country)%>%
  filter(population>5000000)%>%
  select(name,country,population)
bigbigcities
## Source: local data frame [35 x 3]
## Groups: country [25]
## 
##              name country population
##             (chr)   (chr)      (dbl)
## 1    Buenos Aires      AR   13076300
## 2           Dhaka      BD   10356500
## 3       Sao Paulo      BR   10021295
## 4  Rio de Janeiro      BR    6023699
## 5        Kinshasa      CD    7785965
## 6       Zhumadian      CN    8263100
## 7          Tai'an      CN    5499000
## 8        Shanghai      CN   14608512
## 9        Nanchong      CN    7150000
## 10        Beijing      CN    7480601
## ..            ...     ...        ...

The data verbs I used in this activity are as follows: arrange, filter, group_by, join, mutate, summarise, and select.