The goal of this analysis is to identify the overall global temperature trend and compare this to the average temperature of the nearest metropolitan city to me, using moving averages to aid in visualization.
I will be using R, specifically the ggplot2 library to produce my plots. I also created a moving average function to help prepare the data for analysis (I will include below). This report is produced using RMarkdown with pdf output.
To obtain the initial .csv files, I needed to perform two SQL queries, all done natively in browser on Udacity’s Data Analytics Nanodegree program. These queries were as follows:
SELECT * FROM global_data;
SELECT * FROM city_data WHERE country = 'United States' AND city = 'Los Angeles';
Once the CSV’s were exported, I imported them into R and required the ggplot2 package for later.
## Loading required package: ggplot2
Next, I needed a function to obtain moving average data for the global and LA data sets. Since the year ranges were inconsistent, I kept the two data sets separate and ran decade-long moving averages, as you will see later in my plots.
The first ten and last ten values are assigned NA in this case since we need the first ten years to average the first decade - similarly for the final decade. These NA’s will be omitted before the analysis.
moving_avg <- function(num, x) {
mv_avg <- data.frame(0)
colnames(mv_avg) <- c("mvg_avg")
for (i in 1:(length(x))) {
if(i < num || i > (length(x)-num)){
mv_avg[i,] <- NA
}else{
mv_avg[i,] <- mean(x[i:(i+num)])
}
}
mv_avg
}
mvg_avg_la <- moving_avg(10,la_data$avg_temp)
head(mvg_avg_la, 15)
mvg_avg_gl <- moving_avg(10,gl_data$avg_temp)
Now, I omit the NA values and create new data frames containing the moving averages for each decade for the two data sets. Notice the ten year offset in column numbers - meaning we properly implemented the moving average per decade.
full_la <- na.omit(cbind(la_data,mvg_avg_la))
full_gl <- na.omit(cbind(gl_data,mvg_avg_gl))
head(full_gl, 5)
To take a quick look at our data before our line plots, we can summarize the data and look over the trend of our moving average.
For Los Angeles -
summary(full_la); boxplot(full_la$mvg_avg, ylab = 'Avg Temp', xlab = 'Los Angeles', main = 'Moving Avg Distribution')
## year city country avg_temp
## Min. :1858 Los Angeles:146 United States:146 Min. :14.36
## 1st Qu.:1894 1st Qu.:15.44
## Median :1930 Median :15.77
## Mean :1930 Mean :15.83
## 3rd Qu.:1967 3rd Qu.:16.20
## Max. :2003 Max. :17.08
## mvg_avg
## Min. :15.25
## 1st Qu.:15.60
## Median :15.82
## Mean :15.86
## 3rd Qu.:16.05
## Max. :16.72
For Global -
summary(full_gl); boxplot(full_gl$mvg_avg, ylab = 'Avg Temp', xlab = 'Global', main = 'Moving Avg Distribution')
## year avg_temp mvg_avg
## Min. :1759 Min. :6.780 Min. :7.236
## 1st Qu.:1820 1st Qu.:8.065 1st Qu.:8.057
## Median :1882 Median :8.350 Median :8.272
## Mean :1882 Mean :8.332 Mean :8.363
## 3rd Qu.:1944 3rd Qu.:8.650 3rd Qu.:8.642
## Max. :2005 Max. :9.700 Max. :9.604
The first plot is going to show the annual line plot, including all years. This is not what we want because the plot is very jittery and may not help with overall visualization of trends.
g_temp <- ggplot(mapping = aes(year, avg_temp)) +
geom_line(data = la_data, aes(colour='red')) +
geom_line(data = gl_data, aes(colour='blue')) +
guides(colour = guide_legend(title='Average')) +
scale_colour_manual(labels=c('Global', 'Los Angeles'),
values=c('blue','red')) +
labs(title = 'Average Annual Temperatures',
x='Year', y='Average Temperature') +
theme(plot.title = element_text(hjust = 0.5))
g_temp
Picking out any trends outside of the overall trend of our lines is difficult because of the drastic spikes. This is similar to over-fitting - the plot just seems a little too convoluted. Thus we use a moving average to smooth our results.
Next plot is our moving average. The trend and overall comprehension of this plot is much easier. We are following decade long trends where each consecutive year is added to the average, and the first year in the average is omitted. This gives us a ‘slide rule’ perspective of our data.
g <- ggplot(mapping = aes(year, mvg_avg)) +
geom_line(data = full_la, aes(colour='red')) +
geom_line(data = full_gl, aes(colour='blue')) +
guides(colour = guide_legend(title='Average')) +
scale_colour_manual(labels=c('Global', 'Los Angeles'),
values=c('blue','red')) +
labs(title = 'Average Temperatures by Decade',
x='Year', y='Average Temperature') +
theme(plot.title = element_text(hjust = 0.5))
g
We can see an upward trend over the course of several decades, indicating a warming trend. both globally and locally. Los Angeles also has a much higher temperature than the global average. This makes sense since LA is in Southern California, not too far from the equator.
Furthermore, many of the global trends seem to be much more drastic in LA. Looking at the 1900-1950 warming trend, we can see a major spike in our moving average line for Los Angeles over the same year interval. Similarly for years >1975!
Alarmingly, the global warming trend is increasing almost exponentially from 1950. Additionally, the trend is seen in LA but much more linearly.
Lastly, I wanted to include a ‘zoomed’ plot representing temperature trends during my lifetime. This plot uses the moving average over the 2.5 decades I’ve been here on Earth. Interestingly enough, despite me being here for only a fraction of the overall data set range, there is a noticeable warming trend globally and locally.
g_lifetime <- g <- ggplot(mapping = aes(year, mvg_avg)) +
geom_line(data = full_la, aes(colour='red')) +
geom_line(data = full_gl, aes(colour='blue')) +
xlim(1990, max(full_gl$year))+
guides(colour = guide_legend(title='Average')) +
scale_colour_manual(labels=c('Global', 'Los Angeles'),
values=c('blue','red')) +
labs(title = 'Average Temperatures Relative to My Lifetime',
x='Year', y='Average Temperature') +
theme(plot.title = element_text(hjust = 0.5))
g_lifetime
Overall, we can see warming trends in Los Angeles and globally. The periods of locally can also be traced in the global trend, meaning both are fairly consistent. Locally, Los Angeles sees more drastic changes, but this is expected since globally, we are averaging all regions of Earth. I cannot comment directly on global warming since I am not an expert in the area, but very elementary analysis like this really points to a warming trend. It would be interesting to trace \(\text{CO}_2\) levels alongside the temperatures above.