Sorting Dataframes in ggplot and R


## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Create dataframe

Often you need to plot data in a histogram and also wish to order the dataframe according to column values.
Create a sample dataframe, with football team scores in the premier division

team = c("Manchester City", "Manchester United", "Chelsea", "Tottenham", 
          "Liverpool", "Burnley", "Arsenal", "Leicester", 
          "Watford", "Everton")
points = c(49, 38, 35, 31, 31, 31, 30, 26, 22, 22)
df1 = data.frame(team, points)
head(df1)
##                team points
## 1   Manchester City     49
## 2 Manchester United     38
## 3           Chelsea     35
## 4         Tottenham     31
## 5         Liverpool     31
## 6           Burnley     31

So, the data fram is in order of points, but when we plot the histogram, it doesn’t plot in order of appearance in the data frame, rather the teams are by default plotted in alphabetic order, which is not what we want.

library(ggplot2)
# basic layer and options
p <- ggplot(df1, aes(y=points)) #+ opts(axis.text.x=theme_text(angle=90, hjust=1))
 
# default plot (left panel)
# the variables are alphabetically reordered.
p + geom_bar(aes(x=team), stat="identity") + ggtitle("Premier League") + coord_flip()

Re-Order the plot

In order to have the plot produced in the oerder of points we must change the order by explicitly specifying the order:

df1$team <- factor(df1$team, levels = df1$team[order(df1$points)])

And now, you will see the plot produced in the expected order

p <- ggplot(df1, aes(y=points))
 
p + geom_bar(aes(x=team), stat="identity") + ggtitle("Premier league, sorted by points") + coord_flip()


References


http://www.r-cran.com
https://www.rstudio.com
http://www.rpubs.com

https://rviews.rstudio.com
http://www.tendron.net

Alan Brown, CTO, Tendron Systems Ltd

– – – – – – – – – – – –