Synopsis

When looking for datasets to create a nice and meaningful graphic with ‘’’plotly’’’, I found some global temperature data on earth from the Global Hidrology Resource Center of NASA:

https://ghrc.nsstc.nasa.gov/amsutemps/amsutemps.pl

Just a coincidence that the graphic of the website is not working since it seems they do not have more data from the last years. Thus I found my challenge to contribute with a new graphic. This data analysis is done, due to lack of more time, only for the January month values. However this document will be extended in the future to include all other months of the year and to extract more conclusions.

Collecting the data

The website provides the temperature data in text format. I took the data and transformed it into an excel table. The data is a bit messy, containing many -999 values from the last years, so I decided to take it as a NA value. Besides, the data breaks one of the rules of tidy data, “one row per observation”, since each row means several different temperature observations. We removed the years 2002 and from 2009 to 2017 since they all contained the same value “-999”, i.e., NA.

# Replace the -999 for NA
tempJan <- read.table("NearSurfaceJan.csv", sep = "\t", header = TRUE, na.strings = "-999",check.names = F)

Getting and cleaning the data

We perform some cleaning procedures of the data values. The month information from the “Day” column is deteled so that the column “Day” will mean only day of the month, January month in this case.

# delete the month info from day column
tempJan$Day <- sapply(tempJan$Day,function(x){gsub("^01/","",x)})
tempJan$Day <- as.numeric(tempJan$Day)

Now we break each row into several observations, so that each row will be a temperature observation from a specific year. The remaining years are taken as the measure variables and the temperature is the value. Now there will be two new columns “year” and “temp”, and there will be more rows than in the original dataset.

tempMelt <- melt(tempJan,id=c("Day"),measure.vars=c("2003","2004","2005","2006","2007","2008"))
tempMelt <- rename(tempMelt, year = variable)
tempMelt <- rename(tempMelt, temp = value)

The temperature unit is Kelvin so we decided to transform it into Celsius grades, which are more commonly used in the rest of the planet. There are 2 NA values so we keep them.

# transform into celsius (temp values are in Kelvin) °C = K - 273.15
intoCelsius <- function(t){
    if(!is.na(t)){
        return(t-273.15)
    }else{
        return(NA)
    }
}
tempMelt$temp <- sapply(tempMelt$temp,function(x){intoCelsius(x)})

Plotting the data with plotly

Now the dataset is prepared to show us some trends. Each year is given a different colour to differenciate better the possible patterns.

# "Data from Global Hidrology Resource Center in NASA"
p <- plot_ly(tempMelt, x = ~Day, y = ~temp, type = 'scatter', mode = 'lines', color = ~year) %>% layout(title = "January daily global average temperature of near surface layer",
             paper_bgcolor = 'rgb(255,255,255)', plot_bgcolor = 'rgb(229,229,229)',
             xaxis = list(title = "Day"),
             yaxis = list(title = "Temperature (Celsius degrees)"))
p

Let’s add the mean temperature in a histogram per year.

tempPerYear <- aggregate(temp ~ year, tempMelt, mean)
tempPerYear
  year      temp
1 2003 -14.74081
2 2004 -14.84474
3 2005 -14.78203
4 2006 -14.95739
5 2007 -14.63568
6 2008 -15.27529
g <- ggplot(tempPerYear, aes(x = year, y = temp))+geom_bar(stat = "identity", position = "dodge",aes(fill=year))+ coord_flip() + labs(x="Year")+labs(y= "Temperature (Celsius degrees)")+labs(title="January mean average temperature per year")
g

Conclusion

The data from January shows a slightly trend of a warmer average global surface temperature from 2003 to 2007. Though the last measured year, 2008, showed a cooler average temperature than the previous year 2007. We need more data through other months of the year to check if year 2008 was an exception only for January or not.