Note: this document can be printed from your browser but is better viewed online, either by following the link on the Data Skills in Geography website or directly at http://rpubs.com/profrichharris/teaching-world-development

1 Introduction

Development and inequality are important topics to geographers, policy makers and to international organisations such as the World Bank, the United Nations, Oxfam and the International Red Star and Red Crescent Movement, amongst many others. Of course, the people for whom it matters most are those most affected - those who do not have the right to life, liberty and security of person that is enshrined in the Universal Declaration of Human Rights.

Within the subject content for GCSE geography, and under the theme of global economic development issues, students are asked to study the demographic, socio-economic, technological and political development of a poorer country or newly emerging country within respect to the wider political, social and environmental context within which the country is placed. Within the AS and A level content, patterns of human development and life expectancy can be studied as part of the Global systems and global governance theme. The theme is well suited to using quantitative information - data - to explore what is happening across the world. Data can challenge our misconceptions. For example, if often surprises people to learn that economic inequalities between countries are falling (it’s inequality within countries that is becoming a source of growing concern): see this post on the World Bank blog site.

Data provide evidence, and evidence forms knowledge. As part of the Data Skills in Geography Project we have developed a number of online teaching resources that include a discussion about inequality in the UK and about the influential book The Spirit Level. They also include a Short Introduction to Quantitative Geography in which we outline why working with real data is an important skill for geographers to learn.

This module makes use of the World Development Indicatorst provided by the World Bank. The data tables can be viewed at http://wdi.worldbank.org/tables, with graphical profiles for individual countries available at http://www.worldbank.org/en/country (select the country and then its data; Afghanistan is interesting).

An alternative and enjoyable way of viewing these and other data, as well as exploring changes over time, is through the Gapminder website, where you can view and download (for educational purposes) a film entitled Don’t Panic - How to End Poverty in 15 years, which is packed full of interesting facts and figures, and very creative ways of presenting the data.

2 Getting Started

A page on the Gapminder website states, somewhat provocatively, that The term ‘Developing Countries’ might have made sense once. Today it’s impossible to make a clear distinction between ‘developing’ and ‘developed’ countries. What is the evidence for this?

  • Play the visualisation with the default settings and watch what happens between 1800 and 2014.
  • How has the situation between developing and developed countries changed?
  • Is it “impossible” to make a clear distinctions between developing and developed countries by 2014?
  • Should the words developing and developed be revised?
  • Try changing the variables from Life expectancy and Children per woman to explore other combinations from the list available (explore, for example, the health variables). Are there any for which the regional differences remain clear?

3 Looking at life expectancy in 2014

3.1 Mapping the data

Whatever we think of the words developed and developing, and of their relevance to the modern world, nobody would deny that there are differences between countries in regard to key indicators of development. An obvious example is life expectancy at birth, which is mapped in Figure 1, below, using 2014 data. If you are using the online version of this document then you can move your mouse across the map and it will reveal the name of the country and its life expectancy (you also can zoom in and out of the map and pan around it). The map is a choropleth map, widely used in geography, where the colour of the shading indicates the amount of what’s been measured in each place - in this case, the ‘amount’ of life expectancy in each country. (Some of the problems with choropleth maps are discussed here)

For this map, lower life expectancies are shaded in a darker red. The data are missing for a few countries.

  • Look at the map: can you detect any geographical patterns on it?
  • Look for clusters of countries where the life expectancy for one country is similar to that of its neighbours but dissimilar to other clusters of countries further away.


Figure 1: Map of Life expectancy at birth (years) in 2014

What the map shows is the geographical distribution of the data - that some countries have higher (or lower) life expectancies than others. The ‘top ten’ countries with the highest life expectancy are,

##          Country life.expectancy population
## 1          Japan           83.59  127131800
## 2          Spain           83.08   46480882
## 3    Switzerland           82.85    8188649
## 4          Italy           82.69   60789140
## 5      Singapore           82.65    5469724
## 6         France           82.37   66495940
## 7  Liechtenstein           82.26      37286
## 8      Australia           82.25   23464086
## 9     Luxembourg           82.21     556319
## 10   Korea, Rep.           82.16   50423955

and those with the lowest are,

##                      Country life.expectancy population
## 189            Guinea-Bissau           55.16    1800513
## 190               Mozambique           55.03   27216276
## 191                  Nigeria           52.75  177475986
## 192                   Angola           52.27   24227524
## 193                     Chad           51.56   13587053
## 194            Cote d'Ivoire           51.56   22157107
## 195             Sierra Leone           50.88    6315627
## 196 Central African Republic           50.66    4804316
## 197                  Lesotho           49.70    2109197
## 198                Swaziland           48.93    1269112

The average life expectancy is 71.37 years. This is the mean average by country but ignores the fact that the countries have different numbers of people living within them. To correct for this, the weighted average (weighting by population size) is 71.44. This is the better estimate of the world average life expectancy although, at it turns out, there is very little difference between the two.

  • Go back to the map: can you find a country with a life expectancy that is close to the world average?
  • How many years longer than the world population average are people expected to live in Japan?
  • How many years less than the world population average are people expected to live in Swaziland?

4 How does life expectancy distributed across the world?

The maps has revealed something important but not unexpected - life expectancy is not the same everywhere but displays a variance whereby some countries have a life expectancy above the average and some have a life expectancy below it. We can infer from the map’s legend that the life expectancies range from about 50 to 80 years.

  • Why does life expectancy vary between countries? What are some of the contributing factors?

4.1 Histograms

A map displays the geographical distribution of the data. Without the map we can still look at the numeric distribution. A common way of doing so is to use a histogram. To produce a histogram, the countries are placed into and counted in groups: for example, those that have a life expectancy from 45 to less than 50 years; those that are from 50 years to less than 55 years; those from 55 to less than 60 years; and so forth. The ‘span’ of each group, 5 years in this example, is known as the bin width. The resulting histogram is shown in Figure 2.

  • Which life expectancies are the most common in the histogram (which has the highest count)?


Figure 2: Histogram of the life expectancies by country in 2014

4.2 Dot plots

An alternative to the histogram is a dot plot, such as the one shown in Figure 3. This has rounded each country’s life expectancy to the nearest whole year and added a ‘dot’ to the chart to represent each country and its life expectancy. As with the histogram, the heights of the columns represent how common each life expectancy is and, to add further information, each dot is shaded according to the World Bank’s classification of countries by income group (2014 classification).

  • Looking at Figure 3, what is the most common life expectancy among the countries in 2014 (rounded to the nearest whole year)?
  • Which dot on the chart represents Japan and which represents Swaziland?
  • Can you find any relationship between which income group a country belongs to and its life expectancy?


Figure 3: Dotplot of the life expectancies by country in 2014

The mean average life expectancy by income group and weighted by population size is

##         High income Upper middle income Lower middle income 
##               79.45               74.69               67.04 
##          Low income 
##               63.64

People in the High income group have the highest life expectancy, on average; those in the Low income group have the lowest. However, there is variation within the groups. The standard deviation is a measure of how much some data vary around their mean (the higher the standard deviation, the greater the variation). The standard deviations measuring the variation in life expectancy within the income groups is

## Lower middle income          Low income Upper middle income 
##                6.80                5.38                5.17 
##         High income 
##                4.06

The greatest variation is within the Lower middle income group, and the least is within the High income group. In other words, there is greater difference between countries in the Lower middle income group than there is in the High income group.

  • Can you think of any reasons for this?

4.3 Statistical summaries

Both the dot plot and the histogram show the ‘shape’ of the data, which is how they are distributed between their lowest and highest values, and which values are most typical. Graphics are visually appealing but we don’t have to use them. We could use a simple statistical summary instead. Table 1 shows that the life expectancies for the countries range from a minimum of 48.93 to a maximum of 83.59, with a mean average (which we already know) of 71.37 and a median average of 73.5.

The difference between the mean and median averages is that the mean is obtained by adding all the values together and then dividing by the number of values there are, whereas the median is the value found halfway along the data if they are sorted in rank order from lowest to highest (the median is the value in the middle). The only additional information shown in Table 1 are the first and third quartiles that together define the interquartile range, which are the values we obtain if the values are sorted in rank order and we take the value that is one quarter (25 per cent) of the way along from lowest to highest, and the value that is three-quarters (75 per cent) of the way. Half of all the data lie between these values, which means they give an indication of the typical range of the data (the ‘mid-spread’), ignoring the highest and lowest values. Here the interquartile range is from 65.88 to 77.36.

Together, the mean, median, minimum, maximum and first and third quartile provide a six number summary of the data.

Table 1: A six number summary of the life expectancy data

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   48.93   65.88   73.50   71.37   77.36   83.59

5 Mobile phone subscriptions in 2014

The World Development Indicators include the number of mobile phone subscriptions in each country. These could be used as a measure of development in information and communication technology. However, there is a problem: mapping these data will quickly reveal that the most populous counties tend to have the most subscriptions. This does not mean that they are more developed only that there are more people living in them.

A more meaningful comparison of countries must allow for the variation in population size between them. We can, for example, express the number of subscriptions as a rate (or ratio) such as the number of subscriptions per 100 of the population (which is calculated by dividing the number of subscriptions by one hundredth of the total population). The result is shown in Figure 4, with a six number summary of the same data in Table 2.

  • What is the average (mean and median) mobile phone subscription rate per country?
  • What is the interquartile range?
  • Using the map and also the interquartile range, name some of the countries that have a subscription rate below the typical range of the data, and also some of the countries that are above it.
  • Compare the subscription rate in North Korea with the number in South Korea
  • The country with the highest subscription rate is in the Middle East. Can you find it?

In fact, there are special administrative areas in parts of China that have higher subscriptions rates but are not been included on this map


Figure 4: Map of the number of mobile phone subscriptions per 100 of the population

Table 2: A six number summary of the number of mobile phone subscriptions data

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.390  75.933 106.495 104.045 129.350 218.430

5.1 Boxplots

The information contained in a six number summary provides the basis of a box plot, a visual way of displaying the range, interquartile range and median of some data (the mean is not included). A box plot of the mobile phone data is shown in Figure 5. If you click on the chart (or look at it closely) and compare the values with those shown in Table 2, you will find that the median is represented by the horizontal line near the middle of the box and that the interquartile range is indicated by the bottom and top of the box. Extending out from the box are the ‘whiskers’ (which is why a box plot is sometimes also known as a box-and-whisker plot). The purpose of the whiskers is to show the range of the data and/or to indicate potential outliers. In our example, the lower whisker does extend down to the minimum of the data but the upper whisker does not reach to the maximum. That is because there is one country where the rate of mobile phone subscriptions is somewhat different from the rest. This is what is meant by an outlier - a number that seems unusually high or low when compared to the rest.

  • Which country is the outlier?


Figure 5: Boxplot of the mobile phone subscriptions data

A useful feature of box plots is that they can be used to compare groups of data, for example the countries can be looked at by income group, as in Figure 6.

  • On average, which group of countries have the higher rate of mobile phone subscriptions?
  • Which group has the greatest range of values?
  • Which has the largest number of outliers?
  • What is the interquartile range for the lower middle (lower mid) income group? How does it compare with the interquartile range for the high income group?


Figure 6: Boxplots of the mobile phone subscriptions data by income group

5.2 Outliers and their effect on statistics

Statistics such as the median and the interquartile range are sometimes described as robust statistics because they are not affected greatly, if at all, by unusually high or low values. The middle value (the median) is the middle value regardless of how extreme the values are at either end of the data’s distribution. The interquartile range (the middle 50 per cent of the data) is also unaffected. The same is not true of either the mean or standard deviation. The mean is increased by unusually high values and decreased by unusually low ones. The standard deviation is raised by any values that widen the spread of the data. The presence of outliers can, in some circumstances, lead to misleading statistics: see When Numbers Mislead. Measures of mean average income, for example, are distorted by a small number of very high earning individuals, see If the line fits: inequality, statistics and The Spirit Level

6 Life expectancy and access to clean water

6.1 Scatter plots

Figure 7 looks at the relationship between access in urban areas to what is a described as an improved water source and life expectancy in 2014. An improved water source is one protected from outside contamination and therefore should be clean. The two variables - access to improved water and life expectancy - are shown together in a scater plot where the X axis of the graph is for the access to water variable and the Y axis is for life expectancy.

7 Lines of best fit

A line of best fit has been added to the plot which reveals there is a positive (meaning upwards sloping) relationship between the variables: a higher percentage of the urban population with access to clean water tends to be associated with a higher life expectancy.

  • Does this relationship seem reasonable to you? Is it what you’d expect?

Clicking on the two points in the bottom right of the graph reveals them to be Swaziland and Lesotho. Both have relatively high levels of access to improved water amongst their urban populations but their life expectancies are low.

  • Click on this table to identify a contributing factor to the low life expectancies in these countries

Towards the upper left of the chart is a territory with a life expectancy that is much higher than expected given that is has the lowest percentage of the urban population with access to improved water.

  • Click on the chart, identify the territory and offer an explanation for the its position.


Figure 7. Showing the relationship between clean water and life expectancy

7.1 Correlation

The upwards sloping line of best fit indicates that there is a positive correlation between access to clean water and life expectancy: it implies that as one increases so does the other. If the line had been downwards sloping (if higher levels of access to clean water were associated with lower life expectancies) then it would be a negative correlation. Correlations typically range from -1, which requires a downwards sloping line, to +1, when it is upwards sloping line. Values of -1 and +1 will only arise if it is a line of perfect fit - that is, if it goes exactly through every point on the graph with none of the points positioned above or below it (if all the points lie along the line). That is not the case in Figure 7, for which the (Pearson) correlation is `r round(with(df2, cor(life.expectancy, water.source, use=“na.or.complete”)),3)’ - less than +1 but still quite sizable. A value of 0 would imply there is no relationship between the variables or, more correctly, no relationship that can be described well by a straight line.

7.2 Regression

The more technical name for the line of best fit shown in Figure 7 is a regression line. Regression is used throughout the sciences and social sciences to build statistical models and to make predictions. In this case, we can use it to predict life expectancy from the percentage of the urban population with access to an improved water source. The equation of a straight line can be expressed as \(y = mx + c\) where \(m\) is the gradient of the line and \(c\) is the y-intercept (the predicted value of the Y variable when the X variable is zero). For the line in Figure 7 the intercept value is 19.99 and the gradient is 0.54. From this, we can predict life expectancy when, for example, 75 per cent of the urban population has access to a clean water supply. It is 19.99 \(+\) 0.54 \(\times\) 75, which is equal to 60.46 years.

  • Check from Figure 7 that the predicted life expectancy (above) looks reasonable
  • What is the predicted life expectancy for a country where 90 per cent of the urban population has access to an improved water source?

8 Over to you

This module has only the scratched the surface of the wide range of data that are available online to view or to download and which are collected to help us understand the worldwide differences in heath, education, wealth, employment and other indicators of development, and how those geographies are changing over time.

Such data are not easy to collect but they are critical if we to understand the areas of greatest human need and how successful we have been as an international community in making for a more equal world.

Further information can be found in the

Atlas of Global Development: a visual guide to the World’s Greatest Challenget and by viewing the United Nations Human Development Data, which reports on changes between 1980 and 2015. An excellent online atlas to ‘compare countries’ statistics, learn about the Human Development Index and explore our changing world’ is available from Canadian Geographic.

Finally, the Atlas of Global Complexity and the Globe of Global Complexity offer some very interesting maps and visualisations that reveal the trading connections between countries. Use it explore the interconnections between people and places brought by global trade, and how these have changed and developed over recent years.

8.1 Acknowledgements

The statistical output and graphs were produced in a free statistical and computing package called R, which is popular in University-level geography because of its analytical and visualisation capabilities, including its ability to draw maps and do geographical types of analysis. This document was created in RStudio for the Royal Geographical Society (with IBG). The Data Skills in Geography Project is funded by the Nuffield Foundation.

8.2 About the author

Richard Harris is a Professor of Quantitative Social Geography at the School of Geographical Sciences, University of Bristol, where he also is director of the University of Bristol Q-Step Centre, part of the multimillion pound national Q-Step initiative providing a step change in the quality of quantitative training provided to social science students. He also is author of the books Quantitative Geography: the basics and co-author of Statistics for Geography and Environmental Sciences. He was the recipient, in 2014, of the Royal Geographical Society’s (with IBG) Taylor & Francis Award for excellence in the promotion and practice of teaching quantitative methods.