In this example, I am using a geographical data frame containing country codes
geonames.cities %>%
select(countrycode) %>%
head()
## countrycode
## 1 AD
## 2 AD
## 3 AE
## 4 AE
## 5 AE
## 6 AE
Here we are going to find the top 10 countries having the highest number of rows
.var <- .(countrycode)
len <- nrow(geonames.cities) # total rows in data frame
result <-
geonames.cities %>%
ddply(.var, summarize, freq = length(eval(as.name(as.character(.var))))) %>% # our var is country code
mutate(prop = freq / len * 100) %>% # calculate proportion of each city
arrange(-freq) %>% # sort by frequency
head(10) # top 10
result %>%
knitr::kable()
| countrycode | freq | prop |
|---|---|---|
| US | 2973 | 12.643531 |
| IN | 2446 | 10.402313 |
| BR | 1201 | 5.107596 |
| RU | 1089 | 4.631284 |
| DE | 1057 | 4.495194 |
| CN | 800 | 3.402229 |
| JP | 737 | 3.134303 |
| GB | 711 | 3.023731 |
| FR | 633 | 2.692013 |
| IT | 575 | 2.445352 |
| Let’s plot the | result | as a graph. |
result %>%
ggplot() +
geom_bar(aes(x = countrycode, y = prop), fill = "royalblue", stat = "identity")
As you can note, that the x axis takes the alphabetical order by default. Let’s change this to sort based on the freq.
result %>%
mutate(countrycode = factor(countrycode, countrycode)) %>% #retain the sort order
ggplot() +
geom_bar(aes(x = countrycode, y = prop), fill = "royalblue", stat = "identity") +
geom_text(aes(
x = countrycode,
y = prop,
label = paste0(round(prop, 2), "%")
), hjust = 0, size = 2) +
scale_y_continuous(
label = function(x)
paste0(round(x), "%"),
expand = c(0.1, 0.1)
) +
coord_flip()
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).