We will continue to look at the Gapminder data provided by Jenny Bryan for the STAT 545A class.
The data contains 6 variables (country, year, pop, continent, lifeExp, gdpPercap), with 1704 rows of data. A short summary of the data and the variables can be found in the following table:
| Variable | Name | Info | |
|---|---|---|---|
| 1 | country | Country | factor, 142 countries |
| 2 | year | Year | integer, 12 years, range: 1952 - 2007 |
| 3 | pop | Population | numeric, quantiles: 0%=60011, 25%=2793664, 50%=7023595.5, 75%=19585221.75, 100%=1318683096 |
| 4 | continent | Continent | factor, values: Asia, Europe, Africa, Americas, Oceania |
| 5 | lifeExp | Life Expectancy | numeric, quantiles: 0%=23.599, 25%=48.198, 50%=60.7125, 75%=70.8455, 100%=82.603 |
| 6 | gdpPercap | GDP per capita | numeric, quantiles: 0%=241.1658765, 25%=1202.06030925, 50%=3531.8469885, 75%=9325.462346, 100%=113523.1329 |
A quick look into the structure of the data is provided by the str function in R:
## 'data.frame': 1704 obs. of 6 variables:
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ pop : num 8425333 9240934 10267083 11537966 13079460 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ gdpPercap: num 779 821 853 836 740 ...
## NULL
Using the Gapminder data set, we will be focusing on data aggregation tasks using the plyr package written by Hadley Wickam. We will stress the importance of figure in data exploration tasks by trying to extract information from the the data set using only tables. As it will soon become painfully obvious, the table is not the best display to use when one is trying to find interesting relationships or trends, unique properties, or secrets that may lie within the data.
We begin by investigating the quantitative variable, “GDP per capita”. We are interested in finding the countries with the lowest and highest GDP per capita for each continent, as well as the difference between the two. This is displayed in the table below, sorted by ascending order of minimum GDP per capita.
| continent | minGdpPercap | maxGdpPercap | diffGdpPercap | |
|---|---|---|---|---|
| 1 | Africa | 241.17 | 21951.21 | 21710.05 |
| 3 | Asia | 331.00 | 113523.13 | 113192.13 |
| 4 | Europe | 973.53 | 49357.19 | 48383.66 |
| 2 | Americas | 1201.64 | 42951.65 | 41750.02 |
| 5 | Oceania | 10039.60 | 34435.37 | 24395.77 |
There are several interesting points that can be found in this table. It seems that the continent, Africa, contains the nation with the smallest GDP per capita, as well as the smallest “rich” country out of all the continents. Asia has the greatest gap between minimum GDP per capita nation and maximum GDP per capita nation, and it is also the home to the highest GDP per capita nation in the world. Oceania has the richest “poorest” country, when compared to the other continents.
Let us try displaying the above table in “Tall” format to check if it is easier or harder to decipher the data:
| continent | stat | type | |
|---|---|---|---|
| 1 | Africa | 241.17 | min |
| 2 | Africa | 21951.21 | max |
| 3 | Africa | 21710.05 | diff |
| 4 | Americas | 1201.64 | min |
| 5 | Americas | 42951.65 | max |
| 6 | Americas | 41750.02 | diff |
| 7 | Asia | 331.00 | min |
| 8 | Asia | 113523.13 | max |
| 9 | Asia | 113192.13 | diff |
| 10 | Europe | 973.53 | min |
| 11 | Europe | 49357.19 | max |
| 12 | Europe | 48383.66 | diff |
| 13 | Oceania | 10039.60 | min |
| 14 | Oceania | 34435.37 | max |
| 15 | Oceania | 24395.77 | diff |
It seems that the “wide” table format is easier to read in this first example. This may be because having the data in a long format does not organize the data in a way that is immediately useful for us to understand, forcing us to use our brains more. If I had to choose one table to display, I would keep the “wide” table.
The next immediate question that arises from looking at these tables is “Which country in these continents have the lowest/highest GDP per capita?”. It is a natural extension to our above tables, so we add the country variable for both the minimum and maximum GDP per capita variables. We will leave out the variable that depicted the difference in GDP per capita between the minimum and maximum in the next table.
| continent | minCntry | minGDP | maxCntry | maxGDP | |
|---|---|---|---|---|---|
| 1 | Africa | Congo, Dem. Rep. | 241.17 | Libya | 21951.21 |
| 3 | Asia | Myanmar | 331.00 | Kuwait | 113523.13 |
| 4 | Europe | Bosnia and Herzegovina | 973.53 | Norway | 49357.19 |
| 2 | Americas | Haiti | 1201.64 | United States | 42951.65 |
| 5 | Oceania | Australia | 10039.60 | Australia | 34435.37 |
We were initially a little confused when Australia was found to be the country with both the minimum GDP per capita and the maximum GDP per capita in Oceania. If we delve further into the data set, we find two interesting facts. First, the continent, Oceania, contains only two countries, Australia and New Zealand. Second, Australia had a smaller GDP per capita when compared to New Zealand in 1952. However, the growth of GDP per capita in Australia overtook that of New Zealand, and we find that the maximum of this variable is depicted by Australia in 2012, as we read in the above table. Due to the my ineptitude in geography, I cannot say much about the other countries in the table, except that the United States having the greatest GDP per capita in the Americas was not surprising.
Next, we take a look at the spread of GDP per capita in each continent, and also the number of countries in each continent. We will look at the standard deviation of the GDP per capita variable.
| continent | sdGdpPercap | nCountry | |
|---|---|---|---|
| 1 | Africa | 2827.93 | 52 |
| 5 | Oceania | 6358.98 | 2 |
| 2 | Americas | 6396.76 | 25 |
| 4 | Europe | 9355.21 | 30 |
| 3 | Asia | 14045.37 | 33 |
The table is sorted by smallest to largest in the standard deviation of GDP per capita variable. Africa seems to have the smallest spread of GDP per capita, and Asia seems to have the largest. If we take a look at our first table, we can deduce that this may partly be explained by the small and large spread between the minimum and maximum GDP per capita countries within those two continents. It seems that all the countries in Africa have similar GDP per capita, while Asia shows the most diverse set of countries in terms of GDP per capita.
Next, we aim our focus on Life Expectancy. It would be interesting to take a look at global life expectancy and how it changed over the years. For each year, we calculate the mean life expectancy, the trimmed mean life expectancy, and the median life expectancy. The trimmed mean is calculated with 10% of the observations trimmed from each end of the life expectancy data in each year.
| year | mean | trimMean | median | |
|---|---|---|---|---|
| 1 | 1952 | 49.06 | 48.58 | 45.14 |
| 2 | 1957 | 51.51 | 51.27 | 48.36 |
| 3 | 1962 | 53.61 | 53.58 | 50.88 |
| 4 | 1967 | 55.68 | 55.87 | 53.83 |
| 5 | 1972 | 57.65 | 58.01 | 56.53 |
| 6 | 1977 | 59.57 | 60.10 | 59.67 |
| 7 | 1982 | 61.53 | 62.12 | 62.44 |
| 8 | 1987 | 63.21 | 63.92 | 65.83 |
| 9 | 1992 | 64.16 | 65.19 | 67.70 |
| 10 | 1997 | 65.01 | 66.02 | 69.39 |
| 11 | 2002 | 65.69 | 66.72 | 70.83 |
| 12 | 2007 | 67.01 | 68.11 | 71.94 |
Generally, it looks like life expectancy has been increasing over the years from 1952 to 2007; there is a clear upward trend in life expectancy. The three measures of center seem to generally agree with each other. The median shows the greatest change in life expectancy out of the three statistics.
How about we look at the mean and median life expectancy by continent and year? Since we are calculating some summary statistic, we also include the number of observations used to calculate the statistic.
| continent | year | meanLife | medLife | nCountry | |
|---|---|---|---|---|---|
| 1 | Africa | 1952 | 39.14 | 38.83 | 52 |
| 2 | Africa | 1957 | 41.27 | 40.59 | 52 |
| 3 | Africa | 1962 | 43.32 | 42.63 | 52 |
| 4 | Africa | 1967 | 45.33 | 44.70 | 52 |
| 5 | Africa | 1972 | 47.45 | 47.03 | 52 |
| 6 | Africa | 1977 | 49.58 | 49.27 | 52 |
| 7 | Africa | 1982 | 51.59 | 50.76 | 52 |
| 8 | Africa | 1987 | 53.34 | 51.64 | 52 |
| 9 | Africa | 1992 | 53.63 | 52.43 | 52 |
| 10 | Africa | 1997 | 53.60 | 52.76 | 52 |
| 11 | Africa | 2002 | 53.33 | 51.24 | 52 |
| 12 | Africa | 2007 | 54.81 | 52.93 | 52 |
| 13 | Americas | 1952 | 53.28 | 54.74 | 25 |
| 14 | Americas | 1957 | 55.96 | 56.07 | 25 |
| 15 | Americas | 1962 | 58.40 | 58.30 | 25 |
| 16 | Americas | 1967 | 60.41 | 60.52 | 25 |
| 17 | Americas | 1972 | 62.39 | 63.44 | 25 |
| 18 | Americas | 1977 | 64.39 | 66.35 | 25 |
| 19 | Americas | 1982 | 66.23 | 67.41 | 25 |
| 20 | Americas | 1987 | 68.09 | 69.50 | 25 |
| 21 | Americas | 1992 | 69.57 | 69.86 | 25 |
| 22 | Americas | 1997 | 71.15 | 72.15 | 25 |
| 23 | Americas | 2002 | 72.42 | 72.05 | 25 |
| 24 | Americas | 2007 | 73.61 | 72.90 | 25 |
| 25 | Asia | 1952 | 46.31 | 44.87 | 33 |
| 26 | Asia | 1957 | 49.32 | 48.28 | 33 |
| 27 | Asia | 1962 | 51.56 | 49.33 | 33 |
| 28 | Asia | 1967 | 54.66 | 53.66 | 33 |
| 29 | Asia | 1972 | 57.32 | 56.95 | 33 |
| 30 | Asia | 1977 | 59.61 | 60.77 | 33 |
| 31 | Asia | 1982 | 62.62 | 63.74 | 33 |
| 32 | Asia | 1987 | 64.85 | 66.30 | 33 |
| 33 | Asia | 1992 | 66.54 | 68.69 | 33 |
| 34 | Asia | 1997 | 68.02 | 70.27 | 33 |
| 35 | Asia | 2002 | 69.23 | 71.03 | 33 |
| 36 | Asia | 2007 | 70.73 | 72.40 | 33 |
| 37 | Europe | 1952 | 64.41 | 65.90 | 30 |
| 38 | Europe | 1957 | 66.70 | 67.65 | 30 |
| 39 | Europe | 1962 | 68.54 | 69.53 | 30 |
| 40 | Europe | 1967 | 69.74 | 70.61 | 30 |
| 41 | Europe | 1972 | 70.78 | 70.89 | 30 |
| 42 | Europe | 1977 | 71.94 | 72.34 | 30 |
| 43 | Europe | 1982 | 72.81 | 73.49 | 30 |
| 44 | Europe | 1987 | 73.64 | 74.81 | 30 |
| 45 | Europe | 1992 | 74.44 | 75.45 | 30 |
| 46 | Europe | 1997 | 75.51 | 76.12 | 30 |
| 47 | Europe | 2002 | 76.70 | 77.54 | 30 |
| 48 | Europe | 2007 | 77.65 | 78.61 | 30 |
| 49 | Oceania | 1952 | 69.25 | 69.25 | 2 |
| 50 | Oceania | 1957 | 70.30 | 70.30 | 2 |
| 51 | Oceania | 1962 | 71.09 | 71.09 | 2 |
| 52 | Oceania | 1967 | 71.31 | 71.31 | 2 |
| 53 | Oceania | 1972 | 71.91 | 71.91 | 2 |
| 54 | Oceania | 1977 | 72.85 | 72.85 | 2 |
| 55 | Oceania | 1982 | 74.29 | 74.29 | 2 |
| 56 | Oceania | 1987 | 75.32 | 75.32 | 2 |
| 57 | Oceania | 1992 | 76.94 | 76.94 | 2 |
| 58 | Oceania | 1997 | 78.19 | 78.19 | 2 |
| 59 | Oceania | 2002 | 79.74 | 79.74 | 2 |
| 60 | Oceania | 2007 | 80.72 | 80.72 | 2 |
We notice, again, that the “tall” data format is not the best way to view this data. It may be easy to follow the life expectancy for a single continent at a time, but it is not easy to make direct comparisons of life expectancy for different continents at the same year.
We apply some plyr kung-fu using the daply function to force our data to be aggregated as a “wide” table. (Normally, this task can be easily accomplished by using Hadley Wickham's reshape or reshape2 package, but Jenny challenged us with using only plyr). Since we are going to a “wide” table, let us just calculate the mean life expectancy for each year, just to keep things a simple and readable.
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 39.14 | 53.28 | 46.31 | 64.41 | 69.25 |
| 1957 | 41.27 | 55.96 | 49.32 | 66.70 | 70.30 |
| 1962 | 43.32 | 58.40 | 51.56 | 68.54 | 71.09 |
| 1967 | 45.33 | 60.41 | 54.66 | 69.74 | 71.31 |
| 1972 | 47.45 | 62.39 | 57.32 | 70.78 | 71.91 |
| 1977 | 49.58 | 64.39 | 59.61 | 71.94 | 72.85 |
| 1982 | 51.59 | 66.23 | 62.62 | 72.81 | 74.29 |
| 1987 | 53.34 | 68.09 | 64.85 | 73.64 | 75.32 |
| 1992 | 53.63 | 69.57 | 66.54 | 74.44 | 76.94 |
| 1997 | 53.60 | 71.15 | 68.02 | 75.51 | 78.19 |
| 2002 | 53.33 | 72.42 | 69.23 | 76.70 | 79.74 |
| 2007 | 54.81 | 73.61 | 70.73 | 77.65 | 80.72 |
Again, we see the advantage of using a “wide” table format over a “tall” format when we want to manually interpret the data in a table. It looks like the secret of long life resides in Oceania. Africa seems to be the most dangerous continent to life, and although the average life expectancy increased quite a bit since 1952, it has only caught up to 1952 levels of life expectancy in 2007! Asia shows the greatest increase in mean life expectancy in the range of the data provided.
Let us define an arbitrary value for “low life expectancy”. Say, if a country has a life expectancy smaller than the overall median life expectancy (calculated using the entire data set), then it is labelled as having low life expectancy. We tabulate the proportion of countries within each continent that have life expectancies smaller than the overall median (which is equal to 60.7125 years).
| Africa | Americas | Asia | Europe | Oceania | |
|---|---|---|---|---|---|
| 1952 | 1.000 | 0.760 | 0.909 | 0.233 | 0.000 |
| 1957 | 1.000 | 0.640 | 0.818 | 0.100 | 0.000 |
| 1962 | 1.000 | 0.520 | 0.788 | 0.033 | 0.000 |
| 1967 | 0.981 | 0.520 | 0.758 | 0.033 | 0.000 |
| 1972 | 0.962 | 0.400 | 0.606 | 0.033 | 0.000 |
| 1977 | 0.962 | 0.280 | 0.485 | 0.033 | 0.000 |
| 1982 | 0.885 | 0.200 | 0.364 | 0.000 | 0.000 |
| 1987 | 0.788 | 0.080 | 0.303 | 0.000 | 0.000 |
| 1992 | 0.769 | 0.080 | 0.242 | 0.000 | 0.000 |
| 1997 | 0.846 | 0.040 | 0.212 | 0.000 | 0.000 |
| 2002 | 0.788 | 0.040 | 0.152 | 0.000 | 0.000 |
| 2007 | 0.788 | 0.000 | 0.091 | 0.000 | 0.000 |
We choose to skip the “tall” format of the table and skip directly to the “wide” format. Again, this is done using only plyr functions (namely, daply). This table tells a similar story to the table above it.
While we were investigating GDP per capita, we were interested in countries with the lowest and highest GDP per capita within each continent at each year.
Our investigations produced a rather obtuse, and hard to read table.
| continent | year | minCntry | minGDP | maxCntry | maxGDP | |
|---|---|---|---|---|---|---|
| 1 | Africa | 1952 | Lesotho | 298.85 | South Africa | 4725.30 |
| 2 | Africa | 1957 | Lesotho | 336.00 | South Africa | 5487.10 |
| 3 | Africa | 1962 | Burundi | 355.20 | Libya | 6757.03 |
| 4 | Africa | 1967 | Burundi | 412.98 | Libya | 18772.75 |
| 5 | Africa | 1972 | Burundi | 464.10 | Libya | 21011.50 |
| 6 | Africa | 1977 | Mozambique | 502.32 | Libya | 21951.21 |
| 7 | Africa | 1982 | Mozambique | 462.21 | Libya | 17364.28 |
| 8 | Africa | 1987 | Mozambique | 389.88 | Gabon | 11864.41 |
| 9 | Africa | 1992 | Mozambique | 410.90 | Gabon | 13522.16 |
| 10 | Africa | 1997 | Congo, Dem. Rep. | 312.19 | Gabon | 14722.84 |
| 11 | Africa | 2002 | Congo, Dem. Rep. | 241.17 | Gabon | 12521.71 |
| 12 | Africa | 2007 | Congo, Dem. Rep. | 277.55 | Gabon | 13206.48 |
| 13 | Americas | 1952 | Dominican Republic | 1397.72 | United States | 13990.48 |
| 14 | Americas | 1957 | Dominican Republic | 1544.40 | United States | 14847.13 |
| 15 | Americas | 1962 | Dominican Republic | 1662.14 | United States | 16173.15 |
| 16 | Americas | 1967 | Haiti | 1452.06 | United States | 19530.37 |
| 17 | Americas | 1972 | Haiti | 1654.46 | United States | 21806.04 |
| 18 | Americas | 1977 | Haiti | 1874.30 | United States | 24072.63 |
| 19 | Americas | 1982 | Haiti | 2011.16 | United States | 25009.56 |
| 20 | Americas | 1987 | Haiti | 1823.02 | United States | 29884.35 |
| 21 | Americas | 1992 | Haiti | 1456.31 | United States | 32003.93 |
| 22 | Americas | 1997 | Haiti | 1341.73 | United States | 35767.43 |
| 23 | Americas | 2002 | Haiti | 1270.36 | United States | 39097.10 |
| 24 | Americas | 2007 | Haiti | 1201.64 | United States | 42951.65 |
| 25 | Asia | 1952 | Myanmar | 331.00 | Kuwait | 108382.35 |
| 26 | Asia | 1957 | Myanmar | 350.00 | Kuwait | 113523.13 |
| 27 | Asia | 1962 | Myanmar | 388.00 | Kuwait | 95458.11 |
| 28 | Asia | 1967 | Myanmar | 349.00 | Kuwait | 80894.88 |
| 29 | Asia | 1972 | Myanmar | 357.00 | Kuwait | 109347.87 |
| 30 | Asia | 1977 | Myanmar | 371.00 | Kuwait | 59265.48 |
| 31 | Asia | 1982 | Myanmar | 424.00 | Saudi Arabia | 33693.18 |
| 32 | Asia | 1987 | Myanmar | 385.00 | Kuwait | 28118.43 |
| 33 | Asia | 1992 | Myanmar | 347.00 | Kuwait | 34932.92 |
| 34 | Asia | 1997 | Myanmar | 415.00 | Kuwait | 40300.62 |
| 35 | Asia | 2002 | Myanmar | 611.00 | Singapore | 36023.11 |
| 36 | Asia | 2007 | Myanmar | 944.00 | Kuwait | 47306.99 |
| 37 | Europe | 1952 | Bosnia and Herzegovina | 973.53 | Switzerland | 14734.23 |
| 38 | Europe | 1957 | Bosnia and Herzegovina | 1353.99 | Switzerland | 17909.49 |
| 39 | Europe | 1962 | Bosnia and Herzegovina | 1709.68 | Switzerland | 20431.09 |
| 40 | Europe | 1967 | Bosnia and Herzegovina | 2172.35 | Switzerland | 22966.14 |
| 41 | Europe | 1972 | Bosnia and Herzegovina | 2860.17 | Switzerland | 27195.11 |
| 42 | Europe | 1977 | Bosnia and Herzegovina | 3528.48 | Switzerland | 26982.29 |
| 43 | Europe | 1982 | Albania | 3630.88 | Switzerland | 28397.72 |
| 44 | Europe | 1987 | Albania | 3738.93 | Norway | 31540.97 |
| 45 | Europe | 1992 | Albania | 2497.44 | Norway | 33965.66 |
| 46 | Europe | 1997 | Albania | 3193.05 | Norway | 41283.16 |
| 47 | Europe | 2002 | Albania | 4604.21 | Norway | 44683.98 |
| 48 | Europe | 2007 | Albania | 5937.03 | Norway | 49357.19 |
| 49 | Oceania | 1952 | Australia | 10039.60 | New Zealand | 10556.58 |
| 50 | Oceania | 1957 | Australia | 10949.65 | New Zealand | 12247.40 |
| 51 | Oceania | 1962 | Australia | 12217.23 | New Zealand | 13175.68 |
| 52 | Oceania | 1967 | New Zealand | 14463.92 | Australia | 14526.12 |
| 53 | Oceania | 1972 | New Zealand | 16046.04 | Australia | 16788.63 |
| 54 | Oceania | 1977 | New Zealand | 16233.72 | Australia | 18334.20 |
| 55 | Oceania | 1982 | New Zealand | 17632.41 | Australia | 19477.01 |
| 56 | Oceania | 1987 | New Zealand | 19007.19 | Australia | 21888.89 |
| 57 | Oceania | 1992 | New Zealand | 18363.32 | Australia | 23424.77 |
| 58 | Oceania | 1997 | New Zealand | 21050.41 | Australia | 26997.94 |
| 59 | Oceania | 2002 | New Zealand | 23189.80 | Australia | 30687.75 |
| 60 | Oceania | 2007 | New Zealand | 25185.01 | Australia | 34435.37 |
Nothing seems to immediately stand out in the table. Usually, countries with the smallest or largest GDP per capita remain so for at least several decades. Also, the minimum and maximum GDP per capita seem to be generally increasing over time, though this is not always the case.
Then we have Asia, the maximum GDP per capita winner according to our previous tables. However, while the rest of the world is slowly increasing in GDP per capita, the richest Asian country, Kuwuit, shows a declining trend for GDP per capita, even losing its crown as top GDP per capita country twice in the process (once to Saudi Arabia in 1982, and once to Singapore in 2002).
For the countries in Asia, we can perform a simple linear regression of GDP per capita versus time and find the slope and the intercept for each country. As we did in class, we correct the year variable by subtracting the smallest year in the data. The estimated intercept and slopes are the next table, which is sorted by lowest to highest slope estimates.
| continent | country | intercept | slope | |
|---|---|---|---|---|
| 16 | Asia | Kuwait | 108891.72 | -1583.96 |
| 10 | Asia | Iraq | 9149.28 | -48.64 |
| 1 | Asia | Afghanistan | 814.79 | -0.44 |
| 20 | Asia | Myanmar | 258.41 | 6.58 |
| 21 | Asia | Nepal | 512.02 | 9.84 |
| 3 | Asia | Bangladesh | 528.71 | 10.50 |
| 14 | Asia | Korea, Dem. Rep. | 2287.16 | 11.08 |
| 4 | Asia | Cambodia | 246.51 | 15.59 |
| 31 | Asia | Vietnam | 317.54 | 25.46 |
| 7 | Asia | India | 286.26 | 28.04 |
| 24 | Asia | Philippines | 1398.12 | 28.24 |
| 33 | Asia | Yemen, Rep. | 689.36 | 32.00 |
| 19 | Asia | Mongolia | 764.46 | 33.76 |
| 23 | Asia | Pakistan | 490.30 | 34.51 |
| 27 | Asia | Sri Lanka | 551.88 | 47.38 |
| 28 | Asia | Syria | 1702.39 | 47.52 |
| 13 | Asia | Jordan | 1759.26 | 49.78 |
| 8 | Asia | Indonesia | 301.45 | 52.36 |
| 5 | Asia | China | -303.78 | 65.17 |
| 32 | Asia | West Bank and Gaza | 1863.86 | 68.95 |
| 17 | Asia | Lebanon | 5168.04 | 76.41 |
| 9 | Asia | Iran | 4110.93 | 118.75 |
| 30 | Asia | Thailand | -322.31 | 122.48 |
| 18 | Asia | Malaysia | -24.07 | 197.46 |
| 25 | Asia | Saudi Arabia | 13417.84 | 248.87 |
| 2 | Asia | Bahrain | 10391.29 | 279.50 |
| 11 | Asia | Israel | 3692.09 | 380.69 |
| 15 | Asia | Korea, Rep. | -2826.11 | 401.58 |
| 22 | Asia | Oman | 721.73 | 415.16 |
| 29 | Asia | Taiwan | -3477.60 | 498.27 |
| 12 | Asia | Japan | 2413.51 | 557.72 |
| 6 | Asia | Hong Kong, China | -1843.04 | 657.15 |
| 26 | Asia | Singapore | -4389.13 | 793.25 |
It seems that, according to our very simple linear regression model*, Kuwait starts with a very high intercept, but has a very large negative slope! On the other hand, Singapore, which we found overtook the GDP per capita of Kuwait in 2002, started off with a moderately negative intercept, but cause up due to a very large slope!
Note that the assumptions for the linear regression fits were not checked. We doubt that a linear regresison model is suitable for our data.
It would be interesting to investigate the GDP per capita of Asia at a future date with plots instead of just tables.
In the end, we found it quite challenging to explore the Gapminder dataset using only tables. On the other hand, this homework provided great practice for using the data aggregation package plyr. Using ddply made everything much easier, and I have grown an appreciation for the “Split-Apply-Combine” methodology preached by Hadley Wickham and recommended by Jenny. I believe that it will be a very useful and powerful tool for data analysis when combined with making plots.
For the code used to generate this report, click here