About the report

This report is an example data visualisation task for the Coursera “Data Visualization” course provided by Univercity of Illinois at Urbana-Champion

The report and its viusalisation is generated in R language and HTML d3 vidget. The report is reproducable i.e. everything that is written could be reproduced just following the instructions for installing the recquired packages. Also all the code use to generate the visualisations is given here in the blocks.

Used R packages

For generating this report, following R packages have been used.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
library(ggthemes)
library(devtools)
library(streamgraph)
## Loading required package: htmlwidgets
## Loading required package: htmltools
# For running the code first time uncoment this section and run the commands 
# install.packages("dplyr")
# install.packages("devtools")
# install.packages("reshape2")
# install.packages("ggplot2")
# install.packages("ggthemes")
# devtools::install_github("hrbrmstr/streamgraph")

Data

For the data visualisation assignemnt the data is downloaded from the web url

# download and unzip the data set if not exist already
url <- "https://d396qusza40orc.cloudfront.net/datavisualization/programming_assignment_1/Data.zip"
if(!file.exists("Data.zip")) {
download.file(url, destfile = "Data.zip", method = "curl")
unzip(zipfile = "Data.zip")
}

Lets first load the data and observe the struture based on the R-s starndard tools

df <- read.csv("data/ExcelFormattedGISTEMPDataCSV.csv",na.strings = c("****","***"))
str(df)
## 'data.frame':    136 obs. of  19 variables:
##  $ Year: int  1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
##  $ Jan : int  -34 -12 3 -38 -21 -57 -38 -59 -43 -19 ...
##  $ Feb : int  -26 -16 5 -37 -15 -29 -43 -42 -40 16 ...
##  $ Mar : int  -22 -1 -1 -12 -30 -19 -33 -24 -39 6 ...
##  $ Apr : int  -29 -2 -22 -19 -35 -36 -22 -33 -24 7 ...
##  $ May : int  -16 -2 -19 -20 -34 -34 -21 -27 -26 -2 ...
##  $ Jun : int  -23 -26 -32 -8 -37 -40 -30 -24 -22 -11 ...
##  $ Jul : int  -20 -11 -26 -3 -32 -27 -12 -17 -11 -13 ...
##  $ Aug : int  -12 -7 -10 -14 -25 -24 -20 -27 -13 -19 ...
##  $ Sep : int  -21 -16 -11 -20 -28 -17 -11 -23 -10 -19 ...
##  $ Oct : int  -20 -23 -24 -18 -25 -14 -22 -33 -4 -22 ...
##  $ Nov : int  -17 -27 -24 -28 -29 -13 -27 -28 -2 -32 ...
##  $ Dec : int  -23 -18 -36 -20 -26 1 -17 -40 -11 -29 ...
##  $ J.D : int  -22 -14 -17 -20 -28 -26 -25 -31 -20 -11 ...
##  $ D.N : int  NA -14 -15 -21 -27 -28 -23 -29 -23 -10 ...
##  $ DJF : int  NA -17 -3 -37 -19 -37 -27 -39 -41 -5 ...
##  $ MAM : int  -22 -2 -14 -17 -33 -30 -25 -28 -30 4 ...
##  $ JJA : int  -18 -15 -23 -8 -31 -30 -21 -23 -15 -14 ...
##  $ SON : int  -19 -22 -20 -22 -27 -15 -20 -28 -5 -24 ...

As we can see the data constist of some time series data for the years by months.

For the visualisation Streamgraph is selected in the interacive mode

df2 <- df[,c("Year","Jan","Feb","Mar","Apr","May",
             "Jun","Jul","Aug","Sep","Oct","Nov","Dec")]
monthly_data <- melt(df2, id=c("Year")) 
names(monthly_data) <- c("Year","Month","Temperature")

streamgraph(data = monthly_data,"Month", "Temperature", "Year", interactive = TRUE) %>%
    sg_axis_x(10, "year", "%Y") %>%
    sg_fill_tableau() %>%
    sg_legend(show=TRUE, label="Month: ")

Another example of visualisation is barchart over time period

First we visualise the January temperature data

p <- ggplot(data = df)
p + geom_bar(aes(x = Year, 
                 y = Jan),
             stat = "identity") + 
    theme_tufte(ticks = FALSE) + 
    labs(list(title = "January data", 
              x = "Year", 
              y = "Temperature"))

Another example of visualising all 12 months data in 12 small graphs

# Preparing the data
df2 <- df[,c("Year","Jan","Feb","Mar","Apr","May",
                 "Jun","Jul","Aug","Sep","Oct","Nov","Dec")]
monthly_data <- melt(df2, id = c("Year")) 
names(monthly_data) <- c("Year","Month","Temperature")

# Generating the graphs
p <- ggplot(data = monthly_data)
p + geom_path(aes(x = Year, 
                  y = Temperature)) + 
    facet_wrap( ~ Month, ncol = 4 ) +
    theme_tufte(ticks = FALSE,base_size = 12) + 
    labs(list(title = "Temperature values for months over years", 
              x = "Year", 
              y = "Value"))