Assignment 3 STAT 545A - Daniel Dinsdale

Numerous tasks were attempted from Homework#3 on data aggregation. Some were successfully completed, whilst others were not. The following is an outline of what went right and wrong for the tasks attempted.

Preliminaries:

Before undertaking the R work, some packages and data had to be imported.

library(plyr)
library(xtable)
gDat <- read.delim("gapminderDataFiveYear.txt")

Aside from importing the relevant data, this code loads both the 'plyr' library for data aggregation and the 'xtable' library for pretty tables.

1) Get the maximum and minimum of GDP per capita for all continents in a “wide” format:

As can be seen below, in this task I used the 'ddply' function within the 'plyr' package to create a 5x3 data frame called contGdp. Within this data frame were 3 variables, namely continent, maxGdpPercap and minGdpPerCap (maximum/minimum gross domestic product per capita). From this data frame it was possible to create a new data frame called contOrdGdp by using the 'arrange' function on the maxGdpPercap column. This sorted the continents in order of max gdp per capita, largest first.

contGdp <- ddply(gDat, ~continent, summarise, maxGdpPercap = max(gdpPercap), 
    minGdpPercap = min(gdpPercap))
contOrdGdp <- arrange(contGdp, desc(maxGdpPercap))
print(xtable(contOrdGdp), type = "html", include.rownames = FALSE)
continent maxGdpPercap minGdpPercap
Asia 113523.13 331.00
Europe 49357.19 973.53
Americas 42951.65 1201.64
Oceania 34435.37 10039.60
Africa 21951.21 241.17

With the continents now in order of maxGdpPercap as can be seen above, there appears to be little relation shown between this variable and the minGdpPercap. For example in row 1 column 2 we can see that although Asia had the largest maximum gdp per capita value, the minimum was actually very small. This suggests large disparity between high and low gdp in Asia. On the whole there seems little relationship between extreme values of the two variables of min/maxGdpPercap.

2) How is life expectancy changing over time on different continents?

In this case I create a 60x3 data frame with the 3 variables continent, year and meanLifeExp (mean life expectancy).

contLifeExp <- ddply(gDat, continent ~ year, summarise, meanLifeExp = mean(lifeExp))
print(xtable(contLifeExp), type = "html", include.rownames = FALSE)
continent year meanLifeExp
Africa 1952 39.14
Africa 1957 41.27
Africa 1962 43.32
Africa 1967 45.33
Africa 1972 47.45
Africa 1977 49.58
Africa 1982 51.59
Africa 1987 53.34
Africa 1992 53.63
Africa 1997 53.60
Africa 2002 53.33
Africa 2007 54.81
Americas 1952 53.28
Americas 1957 55.96
Americas 1962 58.40
Americas 1967 60.41
Americas 1972 62.39
Americas 1977 64.39
Americas 1982 66.23
Americas 1987 68.09
Americas 1992 69.57
Americas 1997 71.15
Americas 2002 72.42
Americas 2007 73.61
Asia 1952 46.31
Asia 1957 49.32
Asia 1962 51.56
Asia 1967 54.66
Asia 1972 57.32
Asia 1977 59.61
Asia 1982 62.62
Asia 1987 64.85
Asia 1992 66.54
Asia 1997 68.02
Asia 2002 69.23
Asia 2007 70.73
Europe 1952 64.41
Europe 1957 66.70
Europe 1962 68.54
Europe 1967 69.74
Europe 1972 70.78
Europe 1977 71.94
Europe 1982 72.81
Europe 1987 73.64
Europe 1992 74.44
Europe 1997 75.51
Europe 2002 76.70
Europe 2007 77.65
Oceania 1952 69.25
Oceania 1957 70.30
Oceania 1962 71.09
Oceania 1967 71.31
Oceania 1972 71.91
Oceania 1977 72.85
Oceania 1982 74.29
Oceania 1987 75.32
Oceania 1992 76.94
Oceania 1997 78.19
Oceania 2002 79.74
Oceania 2007 80.72

It is evident that for all continents mean life expectancy has been increasing since 1952, however the rates at which this has been the case have not been consistent. For example, over the 55 years covered Africa mean life expectancy has increased by around 15 years, whilst Asia has seen an increase of about 24 years. Asia has the largest increase, though this is partially due to a low starting mean life expectancy. All continents bar Africa had a mean life expectancy in 2007 between 70 and 80 years, while Africa had a value of around 55 years as the average value. This is the most noticeable statistic drawn from the table as it shows a real issue with life expectancy in the continent of Africa, even in the present day.

3) Compute a trimmed mean of life expectancy for different years:

Here I took 3 variables which were year, MeanLifeExp (mean yearly global life expectancy) and TrimLifeExp (mean yearly global life expectancy with trim 0.1). This created a 12x3 data frame named yearLifeExp.

yearLifeExp <- ddply(gDat, ~year, summarise, MeanLifeExp = mean(lifeExp), TrimLifeExp = mean(lifeExp, 
    trim = 0.1))
print(xtable(yearLifeExp), type = "html", include.rownames = FALSE)
year MeanLifeExp TrimLifeExp
1952 49.06 48.58
1957 51.51 51.27
1962 53.61 53.58
1967 55.68 55.87
1972 57.65 58.01
1977 59.57 60.10
1982 61.53 62.12
1987 63.21 63.92
1992 64.16 65.19
1997 65.01 66.02
2002 65.69 66.72
2007 67.01 68.11

By observation it appears this trim reduces the mean slight for the first 3 years of 1952, 1957 and 1962. This changes from then on however, as the trim actually gives and increased mean value when compared to the usual mean. Again this is very slight and never gives much more than an increase of 1 year. Having said this, such a change might not be considered small in the context of some data inference.

4) Get the maximum and minimum of GDP per capita for all continents in a “tall” format:

For this question I presentated maximum and minimum values for each continent within one column.

GdpMaxMin <- ddply(gDat, ~continent, summarise, factor = c("Max Gdp", "Min Gdp"), 
    GdpPercap = c(max(gdpPercap), min(gdpPercap)))
print(xtable(GdpMaxMin), type = "html", include.rownames = FALSE)
continent factor GdpPercap
Africa Max Gdp 21951.21
Africa Min Gdp 241.17
Americas Max Gdp 42951.65
Americas Min Gdp 1201.64
Asia Max Gdp 113523.13
Asia Min Gdp 331.00
Europe Max Gdp 49357.19
Europe Min Gdp 973.53
Oceania Max Gdp 34435.37
Oceania Min Gdp 10039.60

By observing the table above we can see the same results as in Q1) but in a slightly different format. To avoid repeating what I have already said I will leave this question here!

5) Count the number of countries with low life expectancy over time by continent (Unsuccessful):

This task was unsuccessful after many hours of work! This time I created a function lessThanCount to count the number of countries with a life expectancy under a specific value (here I tested 66) for specific years. This function worked for specific bounds but I couldn't get it to take the upper life expectancy and a varying bound or to do this within a data arrive to give the appropriate count of countries with low life expectancy over time by continent. With more work I believe this would work but I didn't have the time by this point to go any further. I aimed on adding this column to the lifeExpCount data frame below.

lessThanCount <- function(x, y) {
    r <- 0
    for (i in 1:nrow(x)) {
        if (y != x$year[[i]]) {
            next
        }
        if (x$lifeExp[[i]] < 66) {
            r <- r + 1
        } else {
            r <- r
        }
    }
    return(r)
}
lifeExpCount <- ddply(gDat, continent ~ year, summarise, MeanGloLife = mean(lifeExp))
print(xtable(lifeExpCount), type = "html", include.rownames = FALSE)
continent year MeanGloLife
Africa 1952 39.14
Africa 1957 41.27
Africa 1962 43.32
Africa 1967 45.33
Africa 1972 47.45
Africa 1977 49.58
Africa 1982 51.59
Africa 1987 53.34
Africa 1992 53.63
Africa 1997 53.60
Africa 2002 53.33
Africa 2007 54.81
Americas 1952 53.28
Americas 1957 55.96
Americas 1962 58.40
Americas 1967 60.41
Americas 1972 62.39
Americas 1977 64.39
Americas 1982 66.23
Americas 1987 68.09
Americas 1992 69.57
Americas 1997 71.15
Americas 2002 72.42
Americas 2007 73.61
Asia 1952 46.31
Asia 1957 49.32
Asia 1962 51.56
Asia 1967 54.66
Asia 1972 57.32
Asia 1977 59.61
Asia 1982 62.62
Asia 1987 64.85
Asia 1992 66.54
Asia 1997 68.02
Asia 2002 69.23
Asia 2007 70.73
Europe 1952 64.41
Europe 1957 66.70
Europe 1962 68.54
Europe 1967 69.74
Europe 1972 70.78
Europe 1977 71.94
Europe 1982 72.81
Europe 1987 73.64
Europe 1992 74.44
Europe 1997 75.51
Europe 2002 76.70
Europe 2007 77.65
Oceania 1952 69.25
Oceania 1957 70.30
Oceania 1962 71.09
Oceania 1967 71.31
Oceania 1972 71.91
Oceania 1977 72.85
Oceania 1982 74.29
Oceania 1987 75.32
Oceania 1992 76.94
Oceania 1997 78.19
Oceania 2002 79.74
Oceania 2007 80.72