library(readxl)
Data <- read_excel("~/Desktop/EPSRC Project /ArcLakeGroupSummary.xlsx")

1 Data Checking & Descriptions

str(Data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    732 obs. of  12 variables:
##  $ GloboLakes_ID : num  2 3 4 5 6 7 8 9 10 11 ...
##  $ LakeName      : chr  "SUPERIOR" "VICTORIA" "ARAL Sea" "HURON" ...
##  $ Group         : num  4 6 9 5 9 6 4 4 6 4 ...
##  $ Latitude      : num  47.7 -1.3 45.1 44.8 43.9 ...
##  $ Longitude     : num  -88.2 33.2 60.1 -82.2 -87.1 ...
##  $ Type          : chr  "Lake" "Lake" "Lake" "Lake" ...
##  $ Elevation     : num  184 1140 42 176 176 837 450 157 485 158 ...
##  $ LakeSize      : num  4094 2313 1535 2763 2710 ...
##  $ OverallAvg    : num  6.1 25.14 11.11 8.35 9.42 ...
##  $ OverallMeanAmp: num  14.65 1.56 26.72 19.28 19.26 ...
##  $ PC1           : num  -41.3 100.1 -11.1 -29.7 -24.2 ...
##  $ PC2           : num  -19.132 -13.852 35.134 0.822 3.668 ...
Variable Name Description
Group The group number (there are 9 in total)
Latitude The Latitude of the lake
Longitude The Longitude of the lake
Type A factor with three levels - lake, lagoon & reservoir
Elevation The altitude of the lake (m)
LakeSize The size of the lake _ in terms of the number of ~ 1km pixels
OverallAvg The mean temperature of the lake (celsius)
OverallMeanAmp The overall amplitude of the lake (celsius)
PC1 & PC2 Scores descibing variability of the lakes (derived from a PC analysis of the temperature data)

A lake is a large body of water (larger and deeper than a pond) within a body of land. A lagoon is a shallow body of water seperated from a larger body of water by barrier islands or reefs. A reservoir is a large natural or artificial lake used as a source of water supply.

2 Exploratory Impressions of Numerical Summaries

Beginning by seperating the data set into the 9 classification groups, we are able to get numerical summaries from each group and form some initial impressions of the ways in which the groups differ and can be seperated.

summary(Group1[,-c(1:3)])
##     Latitude       Longitude           Type             Elevation     
##  Min.   :24.85   Min.   :-122.77   Length:55          Min.   : -22.0  
##  1st Qu.:34.34   1st Qu.:  13.98   Class :character   1st Qu.:   3.0  
##  Median :38.07   Median :  45.49   Mode  :character   Median :  69.0  
##  Mean   :37.45   Mean   :  48.66                      Mean   : 360.7  
##  3rd Qu.:40.98   3rd Qu.: 116.66                      3rd Qu.: 470.0  
##  Max.   :46.90   Max.   : 140.37                      Max.   :2113.0  
##     LakeSize        OverallAvg    OverallMeanAmp       PC1        
##  Min.   :  4.00   Min.   :13.06   Min.   :11.28   Min.   : 1.849  
##  1st Qu.:  8.00   1st Qu.:14.70   1st Qu.:17.63   1st Qu.:14.049  
##  Median : 14.00   Median :16.35   Median :22.11   Median :23.793  
##  Mean   : 29.85   Mean   :16.29   Mean   :21.76   Mean   :23.664  
##  3rd Qu.: 28.50   3rd Qu.:17.74   3rd Qu.:25.95   3rd Qu.:32.784  
##  Max.   :283.00   Max.   :19.44   Max.   :28.72   Max.   :43.745  
##       PC2        
##  Min.   : 3.383  
##  1st Qu.:16.835  
##  Median :29.007  
##  Mean   :26.287  
##  3rd Qu.:38.206  
##  Max.   :42.859
summary(Group2[,-c(1:3)])
##     Latitude       Longitude           Type             Elevation     
##  Min.   :20.21   Min.   :-115.83   Length:43          Min.   :-202.0  
##  1st Qu.:29.29   1st Qu.: -91.82   Class :character   1st Qu.:   3.5  
##  Median :30.23   Median : -80.86   Mode  :character   Median :  22.0  
##  Mean   :30.32   Mean   : -22.88                      Mean   : 119.3  
##  3rd Qu.:32.79   3rd Qu.:  44.28                      3rd Qu.:  59.5  
##  Max.   :36.11   Max.   : 116.06                      Max.   :1801.0  
##     LakeSize       OverallAvg    OverallMeanAmp        PC1       
##  Min.   :  4.0   Min.   :18.95   Min.   : 6.987   Min.   :46.92  
##  1st Qu.:  7.0   1st Qu.:20.97   1st Qu.:14.831   1st Qu.:56.64  
##  Median : 14.0   Median :21.95   Median :18.532   Median :64.98  
##  Mean   : 25.3   Mean   :22.09   Mean   :17.420   Mean   :66.50  
##  3rd Qu.: 23.0   3rd Qu.:23.13   3rd Qu.:19.532   3rd Qu.:73.85  
##  Max.   :186.0   Max.   :25.44   Max.   :24.330   Max.   :92.46  
##       PC2         
##  Min.   :-0.8748  
##  1st Qu.:22.1256  
##  Median :28.8977  
##  Mean   :26.5181  
##  3rd Qu.:31.7008  
##  Max.   :38.3728
summary(Group3[,-c(1:3)])
##     Latitude         Longitude            Type             Elevation     
##  Min.   :-18.130   Min.   :-106.110   Length:79          Min.   :-404.0  
##  1st Qu.: -2.275   1st Qu.: -63.100   Class :character   1st Qu.:  25.0  
##  Median :  1.720   Median : -51.500   Mode  :character   Median :  89.0  
##  Mean   :  4.273   Mean   :  -4.154                      Mean   : 233.6  
##  3rd Qu.: 11.945   3rd Qu.:  36.080                      3rd Qu.: 383.5  
##  Max.   : 31.520   Max.   : 137.920                      Max.   :1056.0  
##     LakeSize        OverallAvg    OverallMeanAmp         PC1        
##  Min.   :  3.00   Min.   :25.98   Min.   : 0.4907   Min.   : 99.38  
##  1st Qu.:  7.00   1st Qu.:27.83   1st Qu.: 1.6521   1st Qu.:117.44  
##  Median : 16.00   Median :28.62   Median : 3.2274   Median :124.88  
##  Mean   : 33.75   Mean   :28.96   Mean   : 3.9325   Mean   :124.82  
##  3rd Qu.: 31.00   3rd Qu.:30.20   3rd Qu.: 5.2279   3rd Qu.:133.75  
##  Max.   :281.00   Max.   :31.76   Max.   :11.8852   Max.   :144.09  
##       PC2          
##  Min.   :-23.8997  
##  1st Qu.: -6.6334  
##  Median : -0.3507  
##  Mean   : -0.5058  
##  3rd Qu.:  6.8411  
##  Max.   : 23.1318
summary(Group4[,-c(1:3)])
##     Latitude       Longitude             Type             Elevation   
##  Min.   :28.55   Min.   :-176.0100   Length:121         Min.   :   0  
##  1st Qu.:51.46   1st Qu.:-123.9800   Class :character   1st Qu.:  46  
##  Median :62.35   Median :  31.8600   Mode  :character   Median : 259  
##  Mean   :56.93   Mean   :  -0.9774                      Mean   :1211  
##  3rd Qu.:67.69   3rd Qu.:  97.2700                      3rd Qu.:1127  
##  Max.   :81.80   Max.   : 174.4400                      Max.   :5182  
##     LakeSize        OverallAvg      OverallMeanAmp         PC1        
##  Min.   :   4.0   Min.   :0.04835   Min.   : 0.8017   Min.   :-73.94  
##  1st Qu.:  13.0   1st Qu.:2.44587   1st Qu.:12.2417   1st Qu.:-62.04  
##  Median :  22.0   Median :3.28293   Median :14.6697   Median :-56.74  
##  Mean   : 133.8   Mean   :3.27909   Mean   :14.4160   Mean   :-57.09  
##  3rd Qu.:  43.0   3rd Qu.:4.12047   3rd Qu.:16.6814   3rd Qu.:-52.09  
##  Max.   :4094.0   Max.   :6.09557   Max.   :21.1424   Max.   :-41.32  
##       PC2         
##  Min.   :-55.871  
##  1st Qu.:-27.654  
##  Median :-20.535  
##  Mean   :-21.576  
##  3rd Qu.:-13.910  
##  Max.   : -1.181
summary(Group5[,-c(1:3)])
##     Latitude       Longitude           Type             Elevation   
##  Min.   :30.90   Min.   :-142.77   Length:244         Min.   :   5  
##  1st Qu.:51.81   1st Qu.:-105.59   Class :character   1st Qu.: 151  
##  Median :54.34   Median : -93.51   Mode  :character   Median : 266  
##  Mean   :53.86   Mean   : -33.89                      Mean   : 490  
##  3rd Qu.:56.52   3rd Qu.:  41.32                      3rd Qu.: 528  
##  Max.   :65.02   Max.   : 140.53                      Max.   :4743  
##     LakeSize         OverallAvg    OverallMeanAmp       PC1        
##  Min.   :   4.00   Min.   :4.152   Min.   :13.72   Min.   :-52.23  
##  1st Qu.:   8.00   1st Qu.:5.542   1st Qu.:19.60   1st Qu.:-45.22  
##  Median :  14.00   Median :5.948   Median :20.81   Median :-42.73  
##  Mean   :  58.69   Mean   :6.054   Mean   :20.71   Mean   :-42.15  
##  3rd Qu.:  33.00   3rd Qu.:6.579   3rd Qu.:22.02   3rd Qu.:-39.27  
##  Max.   :2763.00   Max.   :8.349   Max.   :27.45   Max.   :-29.66  
##       PC2         
##  Min.   :-11.886  
##  1st Qu.:  1.943  
##  Median :  6.979  
##  Mean   :  6.899  
##  3rd Qu.: 12.838  
##  Max.   : 26.593
summary(Group6[,-c(1:3)])
##     Latitude         Longitude          Type             Elevation     
##  Min.   :-31.370   Min.   :-91.20   Length:42          Min.   :   1.0  
##  1st Qu.:-21.445   1st Qu.: 26.22   Class :character   1st Qu.: 397.5  
##  Median :-11.045   Median : 31.20   Mode  :character   Median : 809.0  
##  Mean   :-10.465   Mean   : 14.68                      Mean   : 846.8  
##  3rd Qu.: -1.448   3rd Qu.: 36.23                      3rd Qu.:1328.5  
##  Max.   : 14.670   Max.   :136.33                      Max.   :2074.0  
##     LakeSize         OverallAvg    OverallMeanAmp         PC1        
##  Min.   :   4.00   Min.   :21.02   Min.   : 0.9175   Min.   : 77.50  
##  1st Qu.:   7.25   1st Qu.:22.84   1st Qu.: 2.7803   1st Qu.: 86.82  
##  Median :  14.50   Median :24.39   Median : 6.0563   Median : 98.60  
##  Mean   : 142.88   Mean   :24.20   Mean   : 6.1236   Mean   : 96.29  
##  3rd Qu.:  71.50   3rd Qu.:25.58   3rd Qu.: 8.0284   3rd Qu.:106.53  
##  Max.   :2313.00   Max.   :26.75   Max.   :15.0455   Max.   :115.66  
##       PC2         
##  Min.   :-55.011  
##  1st Qu.:-31.274  
##  Median :-25.132  
##  Mean   :-25.300  
##  3rd Qu.:-14.010  
##  Max.   : -7.214
summary(Group7[,-c(1:3)])
##     Latitude        Longitude          Type             Elevation     
##  Min.   :-54.55   Min.   :-73.03   Length:19          Min.   :  21.0  
##  1st Qu.:-51.56   1st Qu.:-72.48   Class :character   1st Qu.: 174.5  
##  Median :-48.75   Median :-71.52   Mode  :character   Median : 264.0  
##  Mean   :-48.38   Mean   :-47.03                      Mean   : 391.3  
##  3rd Qu.:-45.44   3rd Qu.:-69.14                      3rd Qu.: 519.5  
##  Max.   :-40.92   Max.   :170.15                      Max.   :1155.0  
##     LakeSize        OverallAvg     OverallMeanAmp        PC1         
##  Min.   :  5.00   Min.   : 4.824   Min.   : 3.934   Min.   :-29.853  
##  1st Qu.: 10.50   1st Qu.: 6.627   1st Qu.: 5.881   1st Qu.:-19.783  
##  Median : 24.00   Median : 7.749   Median : 7.816   Median :-16.977  
##  Mean   : 33.32   Mean   : 7.930   Mean   : 8.728   Mean   :-12.992  
##  3rd Qu.: 50.00   3rd Qu.: 8.773   3rd Qu.:11.488   3rd Qu.: -3.498  
##  Max.   :109.00   Max.   :11.165   Max.   :17.881   Max.   :  6.935  
##       PC2        
##  Min.   :-83.49  
##  1st Qu.:-70.90  
##  Median :-59.39  
##  Mean   :-62.45  
##  3rd Qu.:-55.31  
##  Max.   :-48.70
summary(Group8[,-c(1:3)])
##     Latitude        Longitude          Type             Elevation     
##  Min.   :-41.14   Min.   :-76.15   Length:29          Min.   :   8.0  
##  1st Qu.:-38.81   1st Qu.:-69.30   Class :character   1st Qu.:  52.0  
##  Median :-36.04   Median :-62.61   Mode  :character   Median : 113.0  
##  Mean   :-34.42   Mean   :-21.13                      Mean   : 564.1  
##  3rd Qu.:-33.16   3rd Qu.:-53.25                      3rd Qu.: 292.0  
##  Max.   :-11.02   Max.   :175.90                      Max.   :3975.0  
##     LakeSize        OverallAvg    OverallMeanAmp        PC1       
##  Min.   :  7.00   Min.   :12.30   Min.   : 2.755   Min.   :16.49  
##  1st Qu.:  8.00   1st Qu.:13.93   1st Qu.: 9.290   1st Qu.:27.45  
##  Median : 17.00   Median :15.02   Median :12.215   Median :39.43  
##  Mean   : 42.69   Mean   :15.70   Mean   :11.363   Mean   :41.41  
##  3rd Qu.: 40.00   3rd Qu.:17.71   3rd Qu.:13.426   3rd Qu.:56.13  
##  Max.   :285.00   Max.   :20.01   Max.   :16.894   Max.   :72.38  
##       PC2        
##  Min.   :-76.65  
##  1st Qu.:-58.18  
##  Median :-55.54  
##  Mean   :-56.04  
##  3rd Qu.:-53.41  
##  Max.   :-38.63
summary(Group9[,-c(1:3)])
##     Latitude       Longitude             Type             Elevation     
##  Min.   :37.47   Min.   :-121.8900   Length:100         Min.   :   1.0  
##  1st Qu.:43.98   1st Qu.: -81.3675   Class :character   1st Qu.:  34.0  
##  Median :46.22   Median :  26.4050   Mode  :character   Median : 112.5  
##  Mean   :47.21   Mean   :  -0.8504                      Mean   : 359.3  
##  3rd Qu.:50.02   3rd Qu.:  47.0525                      3rd Qu.: 407.0  
##  Max.   :59.44   Max.   : 140.8900                      Max.   :1996.0  
##     LakeSize        OverallAvg     OverallMeanAmp       PC1         
##  Min.   :   2.0   Min.   : 7.623   Min.   :13.95   Min.   :-32.676  
##  1st Qu.:   8.0   1st Qu.: 8.526   1st Qu.:22.36   1st Qu.:-27.459  
##  Median :  16.0   Median : 9.554   Median :23.77   Median :-18.843  
##  Mean   : 101.3   Mean   : 9.824   Mean   :23.64   Mean   :-18.285  
##  3rd Qu.:  35.0   3rd Qu.:10.902   3rd Qu.:25.44   3rd Qu.:-10.377  
##  Max.   :2710.0   Max.   :12.775   Max.   :29.81   Max.   :  2.023  
##       PC2        
##  Min.   :-2.739  
##  1st Qu.:17.485  
##  Median :22.422  
##  Mean   :22.553  
##  3rd Qu.:29.537  
##  Max.   :45.189

The group summaries appear to have a number of features that may offer great ways to discriminate between groups.

Looking at the Elevation summaries for each group it would appear that all of the groups apart from group 6 may have multimodal distributions for this variable, as their medians (a robust estimate) greatly differs from their means (a non-robust estimate).

There appears to be large differences in the OverallAvg for each group as indicated by both the medians and means for each group - this may be a variable that offers good discrimination between groups.

The size of the groups vary greatly (min=19, max=244).

3 Visual Impressions of variables and groups

Plotting the pairs plot of variables, treating the groups a factors, we are looking to observe graphical distinctions between groups. A larger version can be found here - left click and open in a new tab. Element [1,1] defines the colour coding of the groups. The lake size in this instance is the log of the original lake size (The original observations for lake size were very left hand skewed).

Data$Group<-factor(Data$Group)
Data$LakeSize<-log(Data$LakeSize)

  ggpairs(Data[, -c(1,2,6)] ,aes(colour=Group),
          upper=list(continuous = "points", combo = "box_no_facet", discrete = "facetbar", na = "na"), 
          lower = list(continuous = "points", combo="box_no_facet", discrete = "facetbar", na = "na"))

There are a number of illuminating points that can be made from the pairs plot above.

As can be seen in the first column of boxplots for continuous variables against groups, there are a number of plots that sugggest the distribution of certain variables for the groups contain some significant differences. Latitude, OverallAvg, OverallMeanAmp, PC1, PC2 and to a lesser extent longitude appear to be such variables. The size of the groups are also worth bearing in mind when making comments here.

There are a number of scatterplots that offer a fair bit of discrimination between groups in 2-dimensions. For example, to name a few, Latitude against Longitude, Latitude against OverallAvg and PC2 against PC1.

There is a strong positive linear relationship between OverallAvg and PC1 suggesting that the first principal component largely represnts this variable.

The sinusoidal pattern between Latitude and OverallAvg is induced by the temperature change of the globe that occurs latitude wise.

4 Exploratory Impressions of Variables

The 2-dimensional scatterplot of the first two principal components appear to offer the greatest seperation between groups among all combinations of two variables and should be examined further. The larger points represent the centroids of the groups.

class.means<-t(sapply(by(Data[, 11:12], Data$Group, colMeans),c))
class.means1<-cbind(class.means,c(1:9))
class.means1<-as.data.frame(class.means1)

ggplot(Data, aes(x=PC1, y=PC2))+geom_point(aes(colour=factor(Group)))+ 
       geom_point(data = class.means1, aes(colour=factor(V3)), size=7, alpha=1/2) +
       labs( title="PC1 VS. PC2 (The Larger Icons are Group Means)")

The first two principal components are of some use when it comes to clearly separating groups. Groups 1, 7 & 8 are reasonably well clustered / seperated. However, group 5 slightly overlaps with groups 4 & 9 at the perimeter of the cluster and has a very elliptical shape. A similar issue appears to be present with groups 2, 3 & 6. Furthermore, the cluster structure of group 6 is interesting. On the face of things, group 6 appears to have an outlying subcluster of 6 observations, which appears to potentially be more akin to group 8. Would a 3rd principal component be of any use in the 3-dimensional space.

Examining the kernel densities of Elevation for each group offers some interesting insight into how discrmination between groups may occur.

ggplot()+ geom_density(aes(x=Group1$Elevation),colour="red")+
          geom_density(aes(x=Group2$Elevation),colour="blue")+
          geom_density(aes(x=Group3$Elevation),colour="black")+
          geom_density(aes(x=Group4$Elevation), colour="green")+
          geom_density(aes(x=Group5$Elevation),colour="orange")+
          geom_density(aes(x=Group6$Elevation),colour="yellow")+
          geom_density(aes(x=Group7$Elevation),colour="pink")+
          geom_density(aes(x=Group8$Elevation), colour="purple") +   
          geom_density(aes(x=Group9$Elevation), colour="brown")+
  labs(x="Elevation", y="Density", title="Densities of Elevation for each Group")

Strikingly, the Elevation density for group 2 is very multimodal - with sharp peaks of density across a moderate range of elevation values. It can be seen in plots below that bodies of water in group 2 occur in very dense clusters in America, China and the Middle East. This may explain the sharp peaks in density.

Similarly, the densities of OverallAvg for each group offers an interesting insight into how discrimination between groups may occur.

ggplot()+ geom_density(aes(x=Group1$OverallAvg),colour="red")+
          geom_density(aes(x=Group2$OverallAvg),colour="blue")+
          geom_density(aes(x=Group3$OverallAvg),colour="black")+
          geom_density(aes(x=Group4$OverallAvg), colour="green")+
          geom_density(aes(x=Group5$OverallAvg),colour="orange")+
          geom_density(aes(x=Group6$OverallAvg),colour="yellow")+
          geom_density(aes(x=Group7$OverallAvg),colour="pink")+
          geom_density(aes(x=Group8$OverallAvg), colour="purple") +       
          geom_density(aes(x=Group9$OverallAvg), colour="brown")+
  labs(x="Overall Average", y="Density", title="Densities of The Mean Temperature of the Lake (Celsius) for each Group")

Taking into account the relatively small sizes of some groups, the densities appear to be unimodal and offer a great deal of seperation between groups. For example, the mean OverallAvg of group 4 appear to be significanlty different to those in group 3. A Mann-Whitney test could confirm these significant differences nonparametrically.

From previous results and knowledge, it would appear that the overall amplitude temperature may be of some importance in differentiating between groups.

ggplot()+ geom_density(aes(x=Group1$OverallMeanAmp),colour="red")+
          geom_density(aes(x=Group2$OverallMeanAmp),colour="blue")+
          geom_density(aes(x=Group3$OverallMeanAmp),colour="black")+
          geom_density(aes(x=Group4$OverallMeanAmp), colour="green")+
          geom_density(aes(x=Group5$OverallMeanAmp),colour="orange")+
          geom_density(aes(x=Group6$OverallMeanAmp),colour="yellow")+
          geom_density(aes(x=Group7$OverallMeanAmp),colour="pink")+
          geom_density(aes(x=Group8$OverallMeanAmp), colour="purple")+
          geom_density(aes(x=Group9$OverallMeanAmp), colour="brown")+
  labs(x="Overall Mean Amplitude", y="Density", title="Densities of The Overall Amplitude Temperature of the Lake (Celsius) for each Group")

On the whole, OverallMeanAmp only appears to offer legitimate discrmination for groups at the extreme ends of the spectrum - most of the groups span large portions of each other.

However, more interestingly, the majority of density shapes for each group appears to be some minor modification of its OverallAvg counterpart. For example, note group 4 in green, group 5 in orange and group 7 in pink.

5 Exploratory Spatial Impressions

As the data we are analysing is geostatistical, it would be wise to analyse the data spatially by using various mapping techniques. This will allow us to gain a deeper understanding of the problem at hand.

leaflet() %>%
 addTiles() %>%
  addCircles(data=Group1, lat=~Latitude, lng= ~Longitude, radius=10000, color="red") %>%
  addCircles(data=Group2, lat=~Latitude, lng= ~Longitude, radius=10000, color="blue") %>%
  addCircles(data=Group3, lat=~Latitude, lng= ~Longitude, radius=10000, color="black") %>%
  addCircles(data=Group4, lat=~Latitude, lng= ~Longitude, radius=10000, color="green") %>%
  addCircles(data=Group5, lat=~Latitude, lng= ~Longitude, radius=10000, color="orange") %>%
  addCircles(data=Group6, lat=~Latitude, lng= ~Longitude, radius=10000, color="yellow") %>%
  addCircles(data=Group7, lat=~Latitude, lng= ~Longitude, radius=10000, color="pink") %>%
  addCircles(data=Group8, lat=~Latitude, lng= ~Longitude, radius=10000, color="purple") %>%
  addCircles(data=Group9, lat=~Latitude, lng= ~Longitude, radius=10000, color="brown")     

The bodies of water appear to be clustered spatially, which is to be expected. In particular, the groups appear to be stacked horizontally up the latitude of the map. This may encode information about the temperature of the lakes.

Spatial analysis techniques such as kriging may be of some use here in helping to predict future unlabelled bodies of water by producing uncertainty maps.

Interestingly, China seems to be the only distinct land mass that does not follow the horizontal layering of groups - it has groups 4 and 5 in the south west of China when it should perhaps not be there. What is going on here?

leaflet()%>% addProviderTiles("Esri.WorldImagery")%>%
             setView(lng=88.7879, lat=30.1534, zoom=4)%>%
             addCircles(data=Group4, lat=~Latitude, lng= ~Longitude, radius=20000, color="green")%>%
             addCircles(data=Group5, lat=~Latitude, lng= ~Longitude, radius=20000, color="orange")%>%
             addMarkers(lng=88.7879, lat=30.1534, popup="Tibet, China")

This part of south west China is actually Tibet, which is of mountainous terrain - akin to that of Canada and Scandanavian countries. This may suggest that temperature / elevation or a mixture of the two may play a significant role in deciding which label a body of water may eventually get.

Interestingly, along the south eastern region of Africa, group 6 is vertically clustered. This deviates from the horizontal pattern seen elsewhere. Why might this be the case?

leaflet()%>% setView(lng=32.16, lat=-6.07, zoom=4) %>%addTiles()%>%
         addMarkers(lng=33.32, lat=-1.3, popup="Lake : Victoria. Size : 2313 1km pixels. Largest in the region.")%>%
         addMarkers(lng=29.46, lat=-6.07, popup="Lake : Tanganyika. Size : 1148 1km pixels. 2nd largest in the region. ")%>%
         addMarkers(lng=34.59, lat=-11.96, popup="Lake : Niassa. Size : 1048 1km pixels. 3rd largest in the region.")%>%
         addMarkers(lng=32.16, lat=-7.86, popup="Lake : Rukwa. Size : 219 1km pixels. 4th largest in the region.")%>%
         addCircles(data=Group6, lat=~Latitude, lng= ~Longitude, radius=10000, color="yellow")

The bodies of water in this part of Africa appear to be part of a very connected network. The vertical spread of bodies of water in class 6 across the entire east and south eastern part of Africa could possibly be due to the influence that Lake Victoria, Lake Tanganyika & Lago Niassa has across the region - as these are extremely large in comparison to other bodies of water in the region.

6 Exploring the Body Water Type and Classification Labels

The data set consists of three body water types: lagoon, lake and reservoir. Plotting the type counts for each group may give us an insight into the relationship between body water type and group.

Data$Group<-factor(Data$Group)
b<-ggplot(Data, aes(x=Group, fill=Type))+geom_bar(position="dodge")

ggplotly(b)