The data on area, land use, and population are from the UN FAO records for 2009.
data(FAOsimple) # Read in the data: boilerplate
ggplot(data = FAOsimple, aes(x = Country.area, y = Total.Population...Both.sexes)) +
geom_point() + scale_x_log10() + scale_y_log10()
According to the plot, as the area of a country goes up, so does its population.
ggplot(data = FAOsimple, aes(x = Arable.land, y = Total.economically.active.population.in.Agr)) +
geom_point() + scale_x_log10() + scale_y_log10()
This command plots the fraction of a country's total land that is arable against the fraction of a country's total population that is engaged in agricultural work. There appears to be a positive relationship between the two.
Population can be calculated in two ways (divide the total number of people in an area by either agricultural or land area).
First Method:
FAOsimple = transform(FAOsimple, popdens = Total.Population...Both.sexes/Agricultural.area)
Second Method:
FAOsimple = transform(FAOsimple, popdense = Total.Population...Both.sexes/Land.area)
names(FAOsimple)
## [1] "Country"
## [2] "Year"
## [3] "Agricultural.area"
## [4] "Agricultural.area.certified.organic"
## [5] "Agricultural.area.in.conversion.to.organic"
## [6] "Agricultural.area.irrigated"
## [7] "Agricultural.area.organic..total"
## [8] "Arable.land"
## [9] "Arable.land.and.Permanent.crops"
## [10] "Arable.land.area.certified.organic"
## [11] "Arable.land.area.in.conversion.to.organic"
## [12] "Arable.land.organic..total"
## [13] "Country.area"
## [14] "Fallow.land"
## [15] "Forest.area"
## [16] "Inland.water"
## [17] "Land.area"
## [18] "Other.land"
## [19] "Perm..crops.irrigated"
## [20] "Perm..crops.non.irrigated"
## [21] "Perm..meadows...pastures...Cultivated"
## [22] "Perm..meadows...pastures...Nat..grown"
## [23] "Perm..meadows...pastures.Cult....irrig"
## [24] "Perm..meadows...pastures.Cult..non.irrig"
## [25] "Permanent.crops"
## [26] "Permanent.crops.area.certified.organic"
## [27] "Permanent.crops.area.in.conversion.to.organic"
## [28] "Permanent.crops.organic..total"
## [29] "Permanent.meadows.and.pastures"
## [30] "Permanent.meadows.and.pastures.area.certified.organic"
## [31] "Permanent.meadows.and.pastures.area.in.conversion.to.organic"
## [32] "Permanent.meadows.and.pastures.organic..total"
## [33] "Temp..crops.irrigated"
## [34] "Temp..crops.non.irrigated"
## [35] "Temp..meadows...pastures.irrigated"
## [36] "Temp..meadows...pastures.non.irrig."
## [37] "Temporary.crops"
## [38] "Temporary.meadows.and.pastures"
## [39] "Total.area.equipped.for.irrigation"
## [40] "Agricultural.population"
## [41] "Female.economically.active.population"
## [42] "Female.economically.active.population.in.Agr"
## [43] "Male.economically.active.population"
## [44] "Male.economically.active.population.in.Agr"
## [45] "Non.agricultural.population"
## [46] "Rural.population"
## [47] "Total.economically.active.population"
## [48] "Total.economically.active.population.in.Agr"
## [49] "Total.Population...Both.sexes"
## [50] "Total.Population...Female"
## [51] "Total.Population...Male"
## [52] "Urban.population"
## [53] "popdens"
## [54] "popdense"
The new variables appear at the bottom of the set of names.
To compare the populations (total and divided by sexes), make a new variable that adds together the male and female population.
FAOsimple = transform(FAOsimple, poptotal = Total.Population...Female + Total.Population...Male)
ggplot(data = FAOsimple, aes(x = Total.Population...Both.sexes, y = poptotal)) +
geom_point() + scale_x_log10() + scale_y_log10()
There is a strong, positively linear correlation between the actual and expected total population, so we can be sure that the data is correct.
As the amount of farmable land increases, one would expect that forest area decreases in order to make room for it. Check the hypothesis with a plot.
ggplot(data = FAOsimple, aes(x = Arable.land, y = Forest.area)) + geom_point() +
scale_x_log10() + scale_y_log10()
## Warning: Removed 7 rows containing missing values (geom_point).
There is a positive association between these variables, so from the plot it would appear that as arable land increases with forest area.
In the previous example, most data points are in the same range, but there are some outliers. In order to show all the points on one plot, most of the data is grouped together, and it is hard to see what's going on.
A plot of the original data:
ggplot(data = FAOsimple, aes(x = Arable.land, y = Forest.area)) + geom_point()
## Warning: Removed 7 rows containing missing values (geom_point).
From this plot, it is not obvious that there is a relationship between the variables, but when logarithmic axes are used, it becomes clear that the variables are positively associated.