This report is an example data visualisation task for the Coursera “Data Visualization” course provided by Univercity of Illinois at Urbana-Champion
The report and its viusalisation is generated in R language and HTML d3 vidget. The report is reproducable i.e. everything that is written could be reproduced just following the instructions for installing the recquired packages. Also all the code use to generate the visualisations is given here in the blocks.
For generating this report, following R packages have been used.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
library(ggthemes)
library(devtools)
library(streamgraph)
## Loading required package: htmlwidgets
## Loading required package: htmltools
# For running the code first time uncoment this section and run the commands
# install.packages("dplyr")
# install.packages("devtools")
# install.packages("reshape2")
# install.packages("ggplot2")
# install.packages("ggthemes")
# devtools::install_github("hrbrmstr/streamgraph")
For the data visualisation assignemnt the data is downloaded from the web url
# download and unzip the data set if not exist already
url <- "https://d396qusza40orc.cloudfront.net/datavisualization/programming_assignment_1/Data.zip"
if(!file.exists("Data.zip")) {
download.file(url, destfile = "Data.zip", method = "curl")
unzip(zipfile = "Data.zip")
}
Lets first load the data and observe the struture based on the R-s starndard tools
df <- read.csv("data/ExcelFormattedGISTEMPDataCSV.csv",na.strings = c("****","***"))
str(df)
## 'data.frame': 136 obs. of 19 variables:
## $ Year: int 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
## $ Jan : int -34 -12 3 -38 -21 -57 -38 -59 -43 -19 ...
## $ Feb : int -26 -16 5 -37 -15 -29 -43 -42 -40 16 ...
## $ Mar : int -22 -1 -1 -12 -30 -19 -33 -24 -39 6 ...
## $ Apr : int -29 -2 -22 -19 -35 -36 -22 -33 -24 7 ...
## $ May : int -16 -2 -19 -20 -34 -34 -21 -27 -26 -2 ...
## $ Jun : int -23 -26 -32 -8 -37 -40 -30 -24 -22 -11 ...
## $ Jul : int -20 -11 -26 -3 -32 -27 -12 -17 -11 -13 ...
## $ Aug : int -12 -7 -10 -14 -25 -24 -20 -27 -13 -19 ...
## $ Sep : int -21 -16 -11 -20 -28 -17 -11 -23 -10 -19 ...
## $ Oct : int -20 -23 -24 -18 -25 -14 -22 -33 -4 -22 ...
## $ Nov : int -17 -27 -24 -28 -29 -13 -27 -28 -2 -32 ...
## $ Dec : int -23 -18 -36 -20 -26 1 -17 -40 -11 -29 ...
## $ J.D : int -22 -14 -17 -20 -28 -26 -25 -31 -20 -11 ...
## $ D.N : int NA -14 -15 -21 -27 -28 -23 -29 -23 -10 ...
## $ DJF : int NA -17 -3 -37 -19 -37 -27 -39 -41 -5 ...
## $ MAM : int -22 -2 -14 -17 -33 -30 -25 -28 -30 4 ...
## $ JJA : int -18 -15 -23 -8 -31 -30 -21 -23 -15 -14 ...
## $ SON : int -19 -22 -20 -22 -27 -15 -20 -28 -5 -24 ...
As we can see the data constist of some time series data for the years by months.
For the visualisation Streamgraph is selected in the interacive mode
df2 <- df[,c("Year","Jan","Feb","Mar","Apr","May",
"Jun","Jul","Aug","Sep","Oct","Nov","Dec")]
monthly_data <- melt(df2, id=c("Year"))
names(monthly_data) <- c("Year","Month","Temperature")
streamgraph(data = monthly_data,"Month", "Temperature", "Year", interactive = TRUE) %>%
sg_axis_x(10, "year", "%Y") %>%
sg_fill_tableau() %>%
sg_legend(show=TRUE, label="Month: ")
Another example of visualisation is barchart over time period
First we visualise the January temperature data
p <- ggplot(data = df)
p + geom_bar(aes(x = Year,
y = Jan),
stat = "identity") +
theme_tufte(ticks = FALSE) +
labs(list(title = "January data",
x = "Year",
y = "Temperature"))
Another example of visualising all 12 months data in 12 small graphs
# Preparing the data
df2 <- df[,c("Year","Jan","Feb","Mar","Apr","May",
"Jun","Jul","Aug","Sep","Oct","Nov","Dec")]
monthly_data <- melt(df2, id = c("Year"))
names(monthly_data) <- c("Year","Month","Temperature")
# Generating the graphs
p <- ggplot(data = monthly_data)
p + geom_path(aes(x = Year,
y = Temperature)) +
facet_wrap( ~ Month, ncol = 4 ) +
theme_tufte(ticks = FALSE,base_size = 12) +
labs(list(title = "Temperature values for months over years",
x = "Year",
y = "Value"))