This is my Data Visualization Programming Assignment 01 product.
If you are a peer reviewer fell free to jump straight to the “Chart”, the rest of this document tells the step-by-step to read, clean and prepare the plot using R notebooks and Plotly.
This assignment will give you a chance to explore the topics covered in Week 2 of the course by visualizing some data as a chart. The data set we provided deals with world temperatures and comes from NASA. You are welcome to use the additional resources, especially if you do not want to program to complete this project.
Take the data from the GISTEMP site, specifically the data from “Table Data: Global and Hemispheric Monthly Means and Zonal Annual Means.” Alternatively you can use any data that you would like to explore instead.
# loading dataset
gistemp <- read.csv("./data/ExcelFormattedGISTEMPData2CSV.csv")
dim(gistemp) # dimension
## [1] 135 15
str(gistemp) # what is this structure?
## 'data.frame': 135 obs. of 15 variables:
## $ Year : int 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
## $ Glob : int -19 -10 -9 -19 -27 -31 -30 -33 -20 -11 ...
## $ NHem : int -33 -18 -17 -30 -42 -41 -39 -37 -22 -16 ...
## $ SHem : int -5 -2 -1 -8 -12 -21 -21 -28 -17 -6 ...
## $ X24N.90N: int -38 -27 -21 -34 -56 -61 -49 -46 -42 -25 ...
## $ X24S.24N: int -16 -2 -10 -22 -17 -17 -24 -27 7 4 ...
## $ X90S.24S: int -5 -5 4 -2 -11 -20 -20 -26 -33 -17 ...
## $ X64N.90N: int -89 -54 -125 -28 -127 -119 -124 -158 -141 -82 ...
## $ X44N.64N: int -54 -40 -20 -57 -58 -70 -43 -52 -43 -13 ...
## $ X24N.44N: int -22 -14 -3 -20 -41 -43 -38 -21 -22 -21 ...
## $ EQU.24N : int -26 -5 -12 -25 -21 -11 -24 -24 7 -3 ...
## $ X24S.EQU: int -5 2 -8 -19 -14 -23 -24 -31 8 11 ...
## $ X44S.24S: int -2 -6 3 -1 -15 -27 -18 -24 -30 -16 ...
## $ X64S.44S: int -8 -3 8 0 -5 -7 -21 -29 -38 -17 ...
## $ X90S.64S: int 39 37 42 37 40 38 28 21 16 19 ...
The dataset contains annual means of temperatures, likely to be Global averages, North and South averages and some latitudinal zones.
Let’s caracterize some data
summary( gistemp[, c("Glob","NHem", "SHem")] )
## Glob NHem SHem
## Min. :-47.00 Min. :-52.000 Min. :-47.00000
## 1st Qu.:-20.00 1st Qu.:-21.500 1st Qu.:-22.50000
## Median : -8.00 Median : -2.000 Median : -9.00000
## Mean : 1.63 Mean : 3.326 Mean : -0.07407
## 3rd Qu.: 17.50 3rd Qu.: 16.000 3rd Qu.: 25.00000
## Max. : 75.00 Max. : 91.000 Max. : 59.00000
These temperatures are too high (or too low) to be in Celsius?
Let’s define the char type using the rules from the classes:
Chart Selection
In your case we have:
So, following the recommendation table we should use Bar Chart.
library(plotly)
library(magrittr)
# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>%
# Global Average
add_trace( y=~Glob, type = 'bar', name="Global Average") %>%
# naming Y axix
layout( yaxis=list(title="Temperature °C", ticksuffix=" °C"),
title="Global Average Temperature")
chart
What are your x- and y-axes?
We are using Year in the X-Axis and Global Average Temperature (Glob) in the Y-Axis.
Did you use a subset of the data? If so, what was it?
We are using all data-pints but only the Year and Glob columns, we tried plot the other columns together using bar chart but the information became difficult to see.
Are there any particular aspects of your visualization to which you would like to bring attention?
I think besides the dependent variable be a quantitative discrete (years), there is so many years (if you use all data points), that you can consider plot a line chart as well, like years could be a quantitative continuous without losing information.
What do you think the data and your visualization show?
The chart shows that the global average temperature oscillate a lot, year by year, but there is a continuous trend of increasing since around 1910.
In this section I explore other type of charts in this dataset only to check the possibilities. If you are a peer reviewer, you can disconsider the remaining of this document.
Trying to see the same information treating Year as continuous data and choosing a line chart:
# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>%
# Global Average
add_trace( y=~Glob, type = 'scatter', mode="line", name="Global Average") %>%
# naming Y axix
layout( yaxis=list(title="Temperature °C", ticksuffix=" °C"),
title="Global Average Temperature")
chart
This is a good chart too. It is easy to assume that between data points there is a continuous change of temperature. We can see clearing the Year-to-Year oscillation and the increasing trend.
# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>%
# North Hemisphere Temperature
add_trace( y=~NHem, type = 'bar', name="North Hemisphere") %>%
# Global Average
add_trace( y=~Glob, type = 'bar', name="Global Average") %>%
# South Hemisfere Temperature
add_trace( y=~SHem, type = 'bar', name="South Hemisphere") %>%
# naming Y axix
layout( yaxis=list(title="Temperature", ticksuffix=" °C"))
chart
This chart is bad. It is difficult to see any relationship between the temperatures, basically we only see (with noise) the same information if we are plotting the Global average.
Maybe if we differences from average we can see something different.
# removing global average from North and South Hem. Temperatures
gistemp$NorthDiff = gistemp$NHem - gistemp$Glob
gistemp$SouthDiff = gistemp$SHem - gistemp$Glob
# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>%
# North Hemisphere Temperature
add_trace( y=~NorthDiff, type = 'bar', name="North Hemisphere") %>%
# South Hemisfere Temperature
add_trace( y=~SouthDiff, type = 'bar', name="South Hemisphere") %>%
# naming Y axix
layout( yaxis=list(title="Temperature Amplitude", ticksuffix=" °C"),
title="Temperature Difference from Global Average")
chart
Now we can see a an pattern can be explored! In certain times, with several years in a row, the South Hemisphere is hotter than North Hemisphere (from 1880 to 1920 and 1969 to 1992) and other intervals the North is hotter than South, from Global Average. Is this a real pattern?
Can we view this pattern in others columns, let’s see the diference from North to South in different latitudes
gistemp$diff.EQU.24 <- gistemp$EQU.24N - gistemp$X24S.EQU
gistemp$diff.24.44 <- gistemp$X24N.44N - gistemp$X44S.24S
gistemp$diff.44.64 <- gistemp$X44N.64N - gistemp$X44S.24S
gistemp$diff.64.90 <- gistemp$X64N.90N - gistemp$X90S.64S
# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>%
# North Hemisphere Temperature
add_trace( y=~diff.EQU.24, type = 'scatter', mode="line", name="EQ-24") %>%
# South Hemisfere Temperature
add_trace( y=~diff.24.44, type = 'scatter', mode="line", name="24-44") %>%
# South Hemisfere Temperature
add_trace( y=~diff.44.64, type = 'scatter', mode="line", name="44-64") %>%
# South Hemisfere Temperature
add_trace( y=~diff.64.90, type = 'scatter', mode="line", name="64-90") %>%
# naming Y axix
layout( yaxis=list(title="Temperature Amplitude", ticksuffix=" °C"),
title="Temperature Difference in Latitudes")
chart
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
This chart is bad to see all together, but you can turn off some of them dinamically. In this chart we can see the cycles where one hemisphere is hotter than other, same patter saw in last chart.