Please carefully read the statements below and check each box if you agree with the declaration. If you do not check all boxes, your assignment will not be marked. If you make a false declaration on any of these points, you may be investigated for academic misconduct. Students found to have breached academic integrity may receive official warnings and/or serious academic penalties. Please read more about academic integrity here. If you are unsure about any of these points or feel your assessment might breach academic integrity, please contact your course coordinator for support. It is important that you DO NOT submit any assessment until you can complete the declaration truthfully.
By checking the boxes below, I declare the following:
I have not impersonated, or allowed myself to be impersonated by, any person for the purposes of this assessment
This assessment is my original work and no part of it has been copied from any other source except where due acknowledgement is made.
No part of this assessment has been written for me by any other person except where such collaboration has been authorised by the lecturer/teacher concerned.
Where this work is being submitted for individual assessment, I declare that it is my original work and that no part has been contributed by, produced by or in conjunction with another student.
I give permission for my assessment response to be reproduced, communicated compared and archived for the purposes of detecting plagiarism.
I give permission for a copy of my assessment to be retained by the university for review and comparison, including review by external examiners.
I understand that:
Plagiarism is the presentation of the work, idea or creation of another person as though it is your own. It is a form of cheating and is a very serious academic offence that may lead to exclusion from the University. Plagiarised material can be drawn from, and presented in, written, graphic and visual form, including electronic data and oral presentations. Plagiarism occurs when the origin of the material used is not appropriately cited.
Plagiarism includes the act of assisting or allowing another person to plagiarise or to copy my work.
I agree and acknowledge that:
I have read and understood the Declaration and Statement of Authorship above.
If I do not agree to the Declaration and Statement of Authorship in this context and all boxes are not checked, the assessment outcome is not valid for assessment purposes and will not be included in my final result for this course.
The original data visualization selected for the assignment is shown below:
The image above illustrates the historical evolution of annual average temperatures in the contiguous 48 states which makes up the United States of America. The temperature data is derived from land-based weather stations for surface measurements, while satellite technology is utilized to monitor the lower troposphere, the Earth’s lowest atmospheric layer. The “UAH” and “RSS” labels signify two distinct approaches to interpreting the original satellite data.
The objective and audience of the original data visualization chosen can be summarized as follows:
Objective The objective of the visualization is to communicate the long-term trend of the United States temperatures and the causes and effects of climate change over time. To provide a clear representation of temperature change over time, this graph employs the period from 1901 to 2020. The visualization communicates the following:
Audience The audience for the visualization are:
The visualization chosen had the following issues:
#issue 1: Missing Information: The graph lacks a clear legend indicating what each line represents. The dataset also contains null values, which may have distorted the visualization.
#issue 2: Overlapping Lines: The lines representing different data series (Earth’s surface, lower troposphere measured by satellite, and lower troposphere measured by RSS) overlap in several places. This can make it difficult to distinguish between the series and understand their individual trends.
#issue 3: Color Choices: The choice of two colors to represent the earth’s surface temperature) makes it difficult for viewers to properly understand what is happening. The use of various colors (blue, red, green, and orange) closely packed might not be optimal for those with color vision deficiencies.
The data provided (NOAA 2022), consists of 121 records and 4 columns. Limitations in the data are a result of fewer stations in the early 20th century, and uncertainties in the surface temperature data increase as one goes back in time. The following codes were used to clean the data and fix the issues identified in the dataset provided.
# Load the 'readxl' library for reading Excel files in R.
library(readxl)
# Load the 'ggplot2' library for creating data visualizations using the Grammar of Graphics framework.
library(ggplot2)
# Load the 'dplyr' library for efficient data manipulation tasks such as filtering, sorting, and summarizing.
library(dplyr)
# Load the 'gridExtra' library for additional functionality in arranging and combining multiple plots into layouts.
library(gridExtra)
# Load the 'reshape2' library for data reshaping and restructuring, particularly useful for transforming data formats.
library(reshape2)
# Read data from CSV file
data <- read.csv("~/LEARNING/R/Temperature_USA.csv")
#Review the data type, first few rows, column name and, the data values
glimpse(data)
## Rows: 121
## Columns: 4
## $ Year <int> 1901, 1902, 1903, 1904…
## $ Earth.s.surface..land.and.ocean. <dbl> -0.270, -0.468, -0.666…
## $ Lower.troposphere..measured.by.satellite...UAH. <chr> "null", "null", "null"…
## $ Lower.troposphere..measured.by.satellite...RSS. <chr> "null", "null", "null"…
# Reviewing the structure of the data
str(data)
## 'data.frame': 121 obs. of 4 variables:
## $ Year : int 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 ...
## $ Earth.s.surface..land.and.ocean. : num -0.27 -0.468 -0.666 -0.828 -0.504 -0.378 -0.684 -0.774 -0.792 -0.72 ...
## $ Lower.troposphere..measured.by.satellite...UAH.: chr "null" "null" "null" "null" ...
## $ Lower.troposphere..measured.by.satellite...RSS.: chr "null" "null" "null" "null" ...
# Clean the data by replacing the "null" with NA in the dataset
data[data == "null"] <- NA
# Convert appropriate columns to numeric using apply
data[, 2:4] <- apply(data[, 2:4], 2, as.numeric)
#Confirm the structure of the dataset after cleaning
str(data)
## 'data.frame': 121 obs. of 4 variables:
## $ Year : int 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 ...
## $ Earth.s.surface..land.and.ocean. : num -0.27 -0.468 -0.666 -0.828 -0.504 -0.378 -0.684 -0.774 -0.792 -0.72 ...
## $ Lower.troposphere..measured.by.satellite...UAH.: num NA NA NA NA NA NA NA NA NA NA ...
## $ Lower.troposphere..measured.by.satellite...RSS.: num NA NA NA NA NA NA NA NA NA NA ...
#Reviewing post cleaning
glimpse(data)
## Rows: 121
## Columns: 4
## $ Year <int> 1901, 1902, 1903, 1904…
## $ Earth.s.surface..land.and.ocean. <dbl> -0.270, -0.468, -0.666…
## $ Lower.troposphere..measured.by.satellite...UAH. <dbl> NA, NA, NA, NA, NA, NA…
## $ Lower.troposphere..measured.by.satellite...RSS. <dbl> NA, NA, NA, NA, NA, NA…
# List the column names in the dataset
colnames(data)
## [1] "Year"
## [2] "Earth.s.surface..land.and.ocean."
## [3] "Lower.troposphere..measured.by.satellite...UAH."
## [4] "Lower.troposphere..measured.by.satellite...RSS."
# Rename the columns with lowercase names
colnames(data) <- c("Year", "earth_surface", "uah_lower_troposphere", "rss_lower_troposphere")
In order to reconstruct the visualization, we carry out the following activities:
To address the missing information observed in the critique, we clean the data and standardize it for analysis. The null values in the dataset for certain years can indicate missing or incomplete data, which can affect the accuracy of any analysis or interpretation.
We use a colorblind-friendly palette (Red, black, and orange) would make the visualization more accessible.
We use “ggplot” to address overlapping lines because it is more efficient, flexible and has a customizable framework. It also follows a layered approach to visualization which supports the addition of layers (geoms) such as points, lines, bars and smoothed lines to convey the information we are trying to pass. It also supports automatic legends one of the critiques we identified.
The following plot fixes visualization and the overlapping Lines representing the Earth’s surface, the lower troposphere measured by satellite, and the lower troposphere measured by RSS.
# Create a scatterplot with smoothed line (geom_smooth)
ggplot(data, aes(x = Year, y = earth_surface)) +
geom_point() + # Add points for the scatterplot
geom_smooth(method = "lm", se = FALSE, color = "blue") + # Add linear regression line
labs(title = "Scatterplot with Smoothed Line of Earth Surface Temperatures", x = "Year", y = "Temperature (°F)")
ggplot(data, aes(x = Year, y = uah_lower_troposphere)) +
geom_point() + # Add points for the scatterplot
geom_smooth(method = "lm", se = FALSE, color = "black") + # Add linear regression line
labs(title = "Scatterplot with Smoothed Line of UAH Lower Troposphere Temperatures", x = "Year", y = "Temperature (°F)")
ggplot(data, aes(x = Year, y = rss_lower_troposphere)) +
geom_point() + # Add points for the scatterplot
geom_smooth(method = "lm", se = FALSE, color = "orange") + # Add linear regression line
labs(title = "Scatterplot with Smoothed Line of RSS Lower Troposphere Temperatures", x = "Year", y = "Temperature (°F)")
Facet Wrapping our Visualization - We facet wrap the visualization to allow for comparison of the various columns that makes up the dataset.
# Create a combined dataset with melted data for easy plotting
melted_data <- melt(data, id.vars = "Year")
# Create a scatterplot with smoothed line, faceted by variable
ggplot(data = melted_data, aes(x = Year, y = value, color = variable)) +
geom_point() + # Add points
geom_smooth(method = "lm", se = FALSE) + # Add linear regression line
labs(title = "Scatterplots with Smoothed Lines for Temperature Data", x = "Year", y = "Temperature (°F)") +
facet_wrap(~ variable, scales = "free_y", ncol = 1) + # Facet by variable, allow y-axis to vary, 1 column
scale_color_manual(values = c("earth_surface" = "blue", "uah_lower_troposphere" = "black", "rss_lower_troposphere" = "orange")) + # Define custom colors
theme_minimal() + # Minimal theme
theme(legend.position = "bottom") # Legend position
Baglin, J 2023, ‘Data Visualisation: From Theory to Practice’, RMIT Online, RMIT University Melbourne,
MATH2402, ‘Data Visualisation and Communication’, RMIT Online, RMIT University Melbourne, https://rmit.instructure.com/courses/95340/pages/1-dot-3-1-data-visualisation-critique?module_item_id=3696530, accessed 12 September 2023
Reference list US EPA 2016, Climate Change Indicators: U.S. and Global Temperature | US EPA, US EPA, viewed 10 September 2023, https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-temperature.
n.d., NOAA (National Oceanic and Atmospheric Administration). 2022. Climate at a glance. Accessed March 2022.
RStudio (2021). RStudio Cheatsheets Retreived 22 July 2021, from RStudio website: https://www.rstudio.com/resources/cheatsheets/