This is an exercise to assess the effect of climate change and the relationship between CO2 emission and climate change. You will use the following excel tools: charts, filtering data, frequency tables, bar charts, percentiles and correlation.
Go to the website Goddard Institute for Space Studies. Make sure you right click and open in a new window. From Datasets and GISS Surface Temperature Analysis go down to Tables of Global and Hemispheric Monthly Means and Zonal Annual Means and download the Global-mean monthly, seasonal and annual means, 1880-present updated through most recent month in csv format.
Save the data as an excel workbook.
Take a look at the data and work out what the columns show.
Plot the deviation from the 1950 to 1980 mean. Use January data for 1880 until January 2025. Add titles and dates. It should look like something this:
Explain what the chart shows. You can look at the website to find out more about the data.
Explain columns DJF, MAM, JJA and SON. What does column J-D show. Plot the annual average over time. Add the Horizontal line that shows the average for 1950 to 1980.
You can add some seasons to the main chart to see if they have the same pattern. What do you see?
Take a look at this article from the New York Times. These are really impressive graphs. They are made with R. We can try to produce something similar in excel.
We have a distribution of temperatures. We can use diagrams and descriptive statistics to understand more about these distributions. We will start by creating a frequency table to measure the number of observations in each interval or bucket for June, July and August, between 1951 and 1980
Create a column of buckets that span -0.3 to +1.05. These will be in increments of 5 hundredths.
Get excel to count the number of observation in each bucket. To do this you will need the FREQUENCY function. Create a frequency table that measures the number of observations in each interval or bucket.
Use that data to plot two column charts for the periods 1951 to 1980 and from 1981 to 2010. Compare the two. Compare the two distributions. What do you see?
The New York Times article talks about weather becoming more variable. Let us assess the frequency of ‘extreme’ weather in 1951-1980 compared to 1981-2020. We can also assess the variability of the data with standard deviation and variance.
Using the period 1951 to 1980, characterise the coldest thirty percent as cold and the hottest thirty percent as hot. Use the PERCENTILE.INC function. What is the percentage of months that are ‘hot’ and ‘cold’ in the period 1980 to 2022? You might use the COUNTIF function here. What do your findings suggest?
Calculate the mean and standard deviation for each season and for 1921-1950, 1951-1980 and 1981-2020. Use the STDEV.S function. What does this function do? Has the temperature become more variable?
Now we would like to consider the causes of global warming. These are controversial(ish). You can download data on CO2 from the US National Oceanic and Atmospheric Administration:
You can read about the data here:
Earth System Research Laboratory Article
Look at the data and explain the difference between trend and interpolation.
Plot the trend and interpolation charts for one month (say January).
Combine temperature and CO2 data. You will need to work with data of different frequencies (temperature is annual while CO2 data is monthly).
Draw a scatter plot and calculation the correlation between the two series. You will need the CORREL function. What does the scatter plot and the correlation indicate?
Consider the issue of spurious correlation
A child’s academic performance may be positively correlated with the number of rooms in their house or house size, but could we conclude that building an extra room would make a child smarter, or doing well at school would make your house bigger? It is more plausible that income or wealth, which determines the size of home that a family can afford and the resources available for studying, is the ‘unseen factor’ in this relationship. We could also determine whether income is the reason for this spurious correlation by comparing exam scores for children whose parents have similar income but different house sizes. If there is no correlation between exam scores and house size, then we can deduce that house size was not ‘causing’ exam scores (or vice versa)
How could we try to ensure that this study is not spurious?
This example is taken from the CORE text on data analytics:
You can run through the exercise again using this link. There are instructions in R, Python and Excel.