Introduction

This is my Data Visualization Programming Assignment 01 product.

If you are a peer reviewer fell free to jump straight to the “Chart”, the rest of this document tells the step-by-step to read, clean and prepare the plot using R notebooks and Plotly.

Assignment Overview

This assignment will give you a chance to explore the topics covered in Week 2 of the course by visualizing some data as a chart. The data set we provided deals with world temperatures and comes from NASA. You are welcome to use the additional resources, especially if you do not want to program to complete this project.

Instructions

1. Taking data set

Take the data from the GISTEMP site, specifically the data from “Table Data: Global and Hemispheric Monthly Means and Zonal Annual Means.” Alternatively you can use any data that you would like to explore instead.

2. Parsing dataset

# loading dataset
gistemp <- read.csv("./data/ExcelFormattedGISTEMPData2CSV.csv")
dim(gistemp) # dimension
## [1] 135  15
str(gistemp) # what is this structure?
## 'data.frame':    135 obs. of  15 variables:
##  $ Year    : int  1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 ...
##  $ Glob    : int  -19 -10 -9 -19 -27 -31 -30 -33 -20 -11 ...
##  $ NHem    : int  -33 -18 -17 -30 -42 -41 -39 -37 -22 -16 ...
##  $ SHem    : int  -5 -2 -1 -8 -12 -21 -21 -28 -17 -6 ...
##  $ X24N.90N: int  -38 -27 -21 -34 -56 -61 -49 -46 -42 -25 ...
##  $ X24S.24N: int  -16 -2 -10 -22 -17 -17 -24 -27 7 4 ...
##  $ X90S.24S: int  -5 -5 4 -2 -11 -20 -20 -26 -33 -17 ...
##  $ X64N.90N: int  -89 -54 -125 -28 -127 -119 -124 -158 -141 -82 ...
##  $ X44N.64N: int  -54 -40 -20 -57 -58 -70 -43 -52 -43 -13 ...
##  $ X24N.44N: int  -22 -14 -3 -20 -41 -43 -38 -21 -22 -21 ...
##  $ EQU.24N : int  -26 -5 -12 -25 -21 -11 -24 -24 7 -3 ...
##  $ X24S.EQU: int  -5 2 -8 -19 -14 -23 -24 -31 8 11 ...
##  $ X44S.24S: int  -2 -6 3 -1 -15 -27 -18 -24 -30 -16 ...
##  $ X64S.44S: int  -8 -3 8 0 -5 -7 -21 -29 -38 -17 ...
##  $ X90S.64S: int  39 37 42 37 40 38 28 21 16 19 ...

The dataset contains annual means of temperatures, likely to be Global averages, North and South averages and some latitudinal zones.

Let’s caracterize some data

summary( gistemp[, c("Glob","NHem", "SHem")] )
##       Glob             NHem              SHem          
##  Min.   :-47.00   Min.   :-52.000   Min.   :-47.00000  
##  1st Qu.:-20.00   1st Qu.:-21.500   1st Qu.:-22.50000  
##  Median : -8.00   Median : -2.000   Median : -9.00000  
##  Mean   :  1.63   Mean   :  3.326   Mean   : -0.07407  
##  3rd Qu.: 17.50   3rd Qu.: 16.000   3rd Qu.: 25.00000  
##  Max.   : 75.00   Max.   : 91.000   Max.   : 59.00000

These temperatures are too high (or too low) to be in Celsius?

3 Visualize the data

Let’s define the char type using the rules from the classes:

Chart Selection

Chart Selection

In your case we have:

  • Independent Variable: YEAR (Quantitative Discrete)
  • Dependent Variable: Temperature (Quantitative Continuous)

So, following the recommendation table we should use Bar Chart.

Chart

library(plotly)
library(magrittr)

# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>% 

          # Global Average
          add_trace( y=~Glob, type = 'bar', name="Global Average") %>%

          # naming Y axix
          layout( yaxis=list(title="Temperature °C", ticksuffix=" °C"),
                  title="Global Average Temperature")


chart

Questions to be answered about the visualization

What are your x- and y-axes?

We are using Year in the X-Axis and Global Average Temperature (Glob) in the Y-Axis.

Did you use a subset of the data? If so, what was it?

We are using all data-pints but only the Year and Glob columns, we tried plot the other columns together using bar chart but the information became difficult to see.

Are there any particular aspects of your visualization to which you would like to bring attention?

I think besides the dependent variable be a quantitative discrete (years), there is so many years (if you use all data points), that you can consider plot a line chart as well, like years could be a quantitative continuous without losing information.

What do you think the data and your visualization show?

The chart shows that the global average temperature oscillate a lot, year by year, but there is a continuous trend of increasing since around 1910.

Extension

In this section I explore other type of charts in this dataset only to check the possibilities. If you are a peer reviewer, you can disconsider the remaining of this document.

Line chart

Trying to see the same information treating Year as continuous data and choosing a line chart:

# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>% 

          # Global Average
          add_trace( y=~Glob, type = 'scatter', mode="line", name="Global Average") %>%

          # naming Y axix
          layout( yaxis=list(title="Temperature °C", ticksuffix=" °C"),
                  title="Global Average Temperature")


chart

This is a good chart too. It is easy to assume that between data points there is a continuous change of temperature. We can see clearing the Year-to-Year oscillation and the increasing trend.

Ploting more data toghether

# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>% 

          # North Hemisphere Temperature
          add_trace( y=~NHem, type = 'bar', name="North Hemisphere") %>%

          # Global Average
          add_trace( y=~Glob, type = 'bar', name="Global Average") %>%

          # South Hemisfere Temperature
          add_trace( y=~SHem, type = 'bar', name="South Hemisphere") %>%
  
          # naming Y axix
          layout( yaxis=list(title="Temperature", ticksuffix=" °C"))


chart

This chart is bad. It is difficult to see any relationship between the temperatures, basically we only see (with noise) the same information if we are plotting the Global average.

Maybe if we differences from average we can see something different.

# removing global average from North and South Hem. Temperatures
gistemp$NorthDiff = gistemp$NHem - gistemp$Glob 
gistemp$SouthDiff = gistemp$SHem - gistemp$Glob

# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>% 

          # North Hemisphere Temperature
          add_trace( y=~NorthDiff, type = 'bar', name="North Hemisphere") %>%
          # South Hemisfere Temperature
          add_trace( y=~SouthDiff, type = 'bar', name="South Hemisphere") %>%
          # naming Y axix
          layout( yaxis=list(title="Temperature Amplitude", ticksuffix=" °C"),
                  title="Temperature Difference from Global Average")


chart

Now we can see a an pattern can be explored! In certain times, with several years in a row, the South Hemisphere is hotter than North Hemisphere (from 1880 to 1920 and 1969 to 1992) and other intervals the North is hotter than South, from Global Average. Is this a real pattern?

Can we view this pattern in others columns, let’s see the diference from North to South in different latitudes

gistemp$diff.EQU.24 <- gistemp$EQU.24N - gistemp$X24S.EQU
gistemp$diff.24.44 <- gistemp$X24N.44N - gistemp$X44S.24S
gistemp$diff.44.64 <- gistemp$X44N.64N - gistemp$X44S.24S
gistemp$diff.64.90 <- gistemp$X64N.90N - gistemp$X90S.64S


# dataset and x var
chart <- plot_ly(gistemp, x=~Year) %>% 

          # North Hemisphere Temperature
          add_trace( y=~diff.EQU.24, type = 'scatter', mode="line", name="EQ-24") %>%
          # South Hemisfere Temperature
          add_trace( y=~diff.24.44, type = 'scatter', mode="line", name="24-44") %>%

          # South Hemisfere Temperature
          add_trace( y=~diff.44.64, type = 'scatter', mode="line", name="44-64") %>%

          # South Hemisfere Temperature
          add_trace( y=~diff.64.90, type = 'scatter', mode="line", name="64-90") %>%

 # naming Y axix
          layout( yaxis=list(title="Temperature Amplitude", ticksuffix=" °C"),
                  title="Temperature Difference in Latitudes")


chart
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...
## A line object has been specified, but lines is not in the mode
## Adding lines to the mode...

This chart is bad to see all together, but you can turn off some of them dinamically. In this chart we can see the cycles where one hemisphere is hotter than other, same patter saw in last chart.