1 Introduction

Research Question: How has temperature changed across the world since 1975?

We got our data from data.world, and this data measures the average temperature by month in every country since 1743. This data was very extensive, but unfortunately we weren’t able to use all of it because many of the countries listed in 1743 do not exist anymore today. Additionally, some of the data points in the mid-1700s were left blank, which did not provide us with a full view of what temperature patterns were like at that time. We decided to restrict our data to the year 1975 up to 2013 (the most recent data points available) because we felt it would give us a more accurate look at the data, and a more clear image of temperature patterns and how they have changed by continent. The limitations of only looking at the years 1975 through 2013 are that our sample size becomes smaller, but this is outweighed by the quality of the data during this time period.


2 Exploratory data analysis

continent dt average_temp_by_continent
Africa 1975-02-01 23.39468
Africa 1975-03-01 24.28253
Africa 1975-04-01 24.80840
Africa 1975-05-01 24.68034
Africa 1975-06-01 24.03911
Africa 1975-07-01 23.50423

The first data table shows the first 6 observations of our data set (with 2320 total observations) with columns continent, dt, and average_temp_by_continent. Continent is a categorical variable with five different levels (Africa, Americas, Asia, Europe, and Oceania) that represents continent. Dt is a numerical variable that represents date taken every first day of the month from 1975 to 2013. Average_temp_by_continent is a numerical variable that is the average temperature for a given continent on a given date.

Based on the first graphic, it appears that for each continent the average temperature increases from 1975 until 2013 ( with the possible exception of Oceania). It’s important to note that the reason the data fluctuates so much for each continent is due to the different seasons (average temperatures fluctuate each year between winter and summer). We see an especially clear trend in Africa and the Americas, as it appears that the average temperature has increased by almost a half degree in Celsius from 1975 to 2013.

Looking at the second graphic we see that this trend is prevalent worldwide. Using a best fit line we see that the average temperature world wide has increased almost a full degree in celsius from 1975 to 2013. We will evaluate these results quantatatively below using our multiple regression table.


3 Multiple regression

term estimate std_error statistic p_value conf_low conf_high
intercept 23.8261954 0.2791076 85.365642 0.0000000 23.2788677 24.3735230
continentAmericas -2.7582889 0.2823442 -9.769243 0.0000000 -3.3119635 -2.2046142
continentAsia -2.8684458 0.2823442 -10.159395 0.0000000 -3.4221204 -2.3147712
continentEurope -15.7116902 0.2823442 -55.647299 0.0000000 -16.2653648 -15.1580156
continentOceania -7.9088061 0.2823442 -28.011225 0.0000000 -8.4624807 -7.3551314
dt 0.0000636 0.0000219 2.899851 0.0037687 0.0000206 0.0001067
term estimate std_error statistic p_value conf_low conf_high
intercept 19.0829649 0.3154745 60.489713 0.0000000 18.463019 19.7029112
dt 0.0000704 0.0000323 2.182612 0.0295683 0.000007 0.0001339

It’s important to note that we first used an interaction model for the first regression table, but found that each of our confidence intervals for the date rows (the slope of our graphs) included the value 0. This means that none of the interaction effects were statistically significantly positive, leading us to decide to go with a parallel slopes model instead.

The top regression table corresponds to the change in temperature over time by continent. In this table, the intercept corresponds to Africa, as it is the first continent listed alphabetically. The dt row corresponds with the categorical variable date, and the remaining rows display the difference in average temperature for each continent as compared with Africa. The dt row tells us the change in average temperature for every increase in one day.

The second regression table corresponds to the change in temperature over time worldwide. In this table, the intercept row tells us average temperature and the dt row tells us the change in average temperature for every increase in one day.

3.1 Statistical interpretation

The equations for average temperature for each continent (obtained from the first regression table) are the following:

Africa: \(\widehat{Average Temperature} = 23.8582961 + 0.0000636 {dt}\)

Americas: \(\widehat{Average Temperature} = 21.0679065 + 0.0000636 {dt}\)

Asia: \(\widehat{Average Temperature} = 20.9577496 + 0.0000636 {dt}\)

Europe: \(\widehat{Average Temperature} = 8.1145052 + 0.0000636 {dt}\)

Oceania: \(\widehat{Average Temperature} = 15.9173893 + 0.0000636 {dt}\)

The equation for average temperature worldwide (obtained from the second regression table) is the following:

\(\widehat{Average Temperature} = 19.0829649 + 0.0000704 {dt}\)

From these equations we can see that for every continent — and worldwide — the average temperature has an associated increase for every one day increase (albeit with different values). For example, we can see that in Africa the average temperature goes up by 0.0000636 degrees Celsius every day beginning at a temperature of about 23.9 degrees Celsius on January 1st, 1975. We can also see that worldwide the average temperature goes up by 0.0000704 degrees Celsius every day with a beginning temperature of about 19.1 degrees Celsius starting on January 1st, 1975. This means that for each continent the average temperature has an associated positive correlation with date.

3.2 Non-statistical interpretation

We decided to analyze the data for changes in temperature for each continent as well as worldwide. The results show that as time goes on, for each day that passes, there is an associated increase in temperature for each continent and worldwide. We decided to analyze the change in averae temperature worldwide in addition to each continent to help mitigate the effect of any potential outliers in change in average temperature by continent. By looking at the change in average temperature worldwide over time we are able to confirm that the temperature in each continent is increasing at a meaningful rate in the big picture, which can be seen in the equations above.


4 Inference for multiple regression

To examine the confidence intervals and p-values from our two data tables we will analyze the date rows in each regression table. While we could examine the p-values and confidence intervals for the other rows, these simply tell us the intercept of our graphs, that is, the average temperature recorded on the first of January in 1975 according to our data. These rows all have extremely small p-values, meaning that we know that the fact that the estimate column states that they have different values than Africa for the temperature intercept is valid. (We reject the null hypothesis that the temperature intercept is the same as Africa for each given continent.)

Looking at the date row in our top regression table (which corresponds to average temperature faceted by continent), we see that our 95% confidence interval is [0.0000206, 0.0001067]. This means that with 95% confidence we would expect to see any value in between 0.0000206 and 0.0001067 for the slope of our graphs. (This slope corresponds with the change in temperature in degrees Celisus for every one day increase.) We also see a p-value of 0.0037687, which is relatively small. This means that if we were being liberal with our hypothesis test (we chose a relatively large value for alpha), we could reject the null hypothesis that the value for date should be 0.

Looking at the date row in our bottom regression table (which corresponds to average temperature worldwide), we see that our 95% confidence interval is [0.0000070, 0.0001339]. Just like as above, this means that with 95% confidence we would expect to see any value in between 0.0000070 and 0.0001339 for the slope of our graph. (This slope corresponds with the change in temperature in degrees Celisus for every one day increase.) The p-value here is 0.0295683, which is larger than the p-value in the above regression table, but with a liberal hypothesis test, we could reject the null hypothesis that the value for date should be 0.

Our confidence intervals and p-values are based on the normal model. However, as can be seen above, our normal model does not hold true, as when we look at the scatterplots for residuals both by continent and worldwide, we see a clear pattern (positive linear relation). Additionally, we see that while Africa and the Americas residual histogram has a normal spread, the other three continents do not. It appears as if there are four seperate normal distributions within each of these continents residual histograms which is important to note. This could be because the residuals vary based on each of the four seasons, but we don’t have the tools yet to know this for certain. Regardless of the reasoning, the non-normality of all of our residual histograms and clear pattern in our residual scatterplot means that both our confidence intervals and p-values should be taken with a grain of salt. While they might be accurate, our data does not ensure this.

ID average_temp_by_continent continent dt average_temp_by_continent_hat residual
1 23.395 Africa 1975-02-01 23.944 -0.550
2 24.283 Africa 1975-03-01 23.946 0.336
3 24.808 Africa 1975-04-01 23.948 0.860
4 24.680 Africa 1975-05-01 23.950 0.730
5 24.039 Africa 1975-06-01 23.952 0.087
6 23.504 Africa 1975-07-01 23.954 -0.450
ID worldwide_temp dt worldwide_temp_hat residual
1 15.930 1975-02-01 19.214 -3.284
2 17.786 1975-03-01 19.216 -1.429
3 19.546 1975-04-01 19.218 0.328
4 21.216 1975-05-01 19.220 1.996
5 22.054 1975-06-01 19.222 2.832
6 22.584 1975-07-01 19.224 3.360


5 Conclusion

In this analysis, we first plotted the change in temperature (degrees Celsius) over time for each continent, including Africa, the Americas, Asia, Europe, and Oceania. Then we were curious about what the change in temperature over time was for the entire world, so we created a graph that displayed a worldwide temperature change over time. Initially, we experienced a lot of trouble with our data set, because it included observations from as far back as the 1700s. There was a lot of data to work with, but unfortunately the data didn’t really become clear until about the 1900s. Most of the data from earlier than that either listed countries that no longer exist today or had no entries where the temperature recording was supposed to be listed. In order to clean up our data, we restricted the data set to only dates after January 1975. Although necessary for clarity, this limits our sample size, which increases the width of our confidence intervals, meaning that we have less precise data.

The results of the analysis of this data show that as time passes, from January of 1975 to 2013, there is on average an associated increase of 0.0000704 degrees Celsius per day worldwide. The average associated increase in temperature that we obtained for each continent when we examined them was the same due to the fact that we had to use a parallel slopes model. For each continent, for each day that passes, there is an average associated increase of 0.0000636 degrees Celsius.

However, our confidence intervals and p-values are based on the normal model which does not hold true. As seen in section 4, our scatterplots for residuals both by continent and worldwide have a clear pattern (positive linear relation). Additionally, we see that while Africa and the Americas residual histogram has a normal spread, the other three continents do not. The non-normality of all of our residual histograms and clear pattern in our residual scatterplot means that both our confidence intervals and p-values should be taken with a grain of salt. While they might be accurate, we can see that our data does not ensure this.

It’s important to note that our residual analysis does not account for the fluctuations in our data. The extreme fluctuation that can be seen in our EDA graphs is due to temperature changes that occur by season. In the future, we would need a more refined way to examine the data in order to account for these fluctuations, but we do not have the tools to do that at this time.

Overall, this analysis shows that there was an associated increase of temperature worldwide and in each country from January of 1975 to 2013. According to our calculations, and assuming they are true, the worldwide temperature will increase by one degree Celsius roughly every 39 years if humans don’t act on climate change (all else being equal). This is extremely dangerous, and according to an article by the Washington Post, “dangerous warming” begins when the Earth’s temperature rises two degrees Celsius above “pre-industrial temperatures.” Our data ranges from 1975 to 2013, and industrialization began long before those dates. According to our calculations, if we don’t change how we treat our planet, it will not be long before “dangerous warming” begins on our planet.