About the example

This report is an example data visualisation task for the Coursera “Data Visualization” course provided by Univercity of Illinois at Urbana-Champion

The example viusalisation is generated in R language and ggplot2 package The report is reproducable i.e. everything that is written could be reproduced just following the instructions for installing the recquired packages. Also all the code use to generate the visualisations is given here in the blocks.

Used R packages

For generating this report, following R packages have been used.

library(dplyr,warn.conflicts = FALSE)
library(reshape2)
library(ggplot2)
library(ggthemes)
library(knitr)

# For running the code first time uncoment this section and run the commands 
# install.packages("dplyr")
# install.packages("devtools")
# install.packages("reshape2")
# install.packages("ggplot2")

Data

For the data visualisation assignemnt the data is downloaded from the web url

# download and unzip the data set if not exist already
url <- "https://d396qusza40orc.cloudfront.net/datavisualization/programming_assignment_1/Data.zip"
if(!file.exists("Data.zip")) {
download.file(url, destfile = "Data.zip", method = "curl")
unzip(zipfile = "Data.zip")
}

Lets first load the data and observe the struture based on the R-s starndard tools

df <- read.csv("data/ExcelFormattedGISTEMPDataCSV.csv",na.strings = c("****","***"))

As we can see the data constist of some time series data for the years by months.

kable(head(df))
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J.D D.N DJF MAM JJA SON
1880 -34 -26 -22 -29 -16 -23 -20 -12 -21 -20 -17 -23 -22 NA NA -22 -18 -19
1881 -12 -16 -1 -2 -2 -26 -11 -7 -16 -23 -27 -18 -14 -14 -17 -2 -15 -22
1882 3 5 -1 -22 -19 -32 -26 -10 -11 -24 -24 -36 -17 -15 -3 -14 -23 -20
1883 -38 -37 -12 -19 -20 -8 -3 -14 -20 -18 -28 -20 -20 -21 -37 -17 -8 -22
1884 -21 -15 -30 -35 -34 -37 -32 -25 -28 -25 -29 -26 -28 -27 -19 -33 -31 -27
1885 -57 -29 -19 -36 -34 -40 -27 -24 -17 -14 -13 1 -26 -28 -37 -30 -30 -15

Data visualisation

The idea for this visualisation is to present the monthly deviation graphs in simple and easily comparable form. For this we selected the line plot with smoother trendline (in blue colour).

Now we subset the data and select the year and monthly deviation data

df2 <- df[,c("Year","Jan","Feb","Mar","Apr","May",
                 "Jun","Jul","Aug","Sep","Oct","Nov","Dec")]
monthly_data <- melt(df2, id = c("Year")) 
monthly_data <- na.omit(monthly_data)
names(monthly_data) <- c("Year","Month","Temperature_deviation")

# Generating the graphs
p <- ggplot(data = monthly_data)
p + geom_path(aes(x = Year, 
                  y = Temperature_deviation)) + 
    facet_wrap( ~ Month, ncol = 4 ) +
    theme_tufte(base_size = 12) + 
    stat_smooth(aes(x = Year, 
                  y = Temperature_deviation),method = "gam", level = 0.9999) +
    labs(list(title = "Temperature deviations for monthly averages over years"))

The following discoveries could be could from the visualisation:

  • The increase in temperature has been present over all months over the years.
  • The warmer months (May, June, July, August) have lower deviace of temperature fulcutations compared to colder months (Janary, February, March).