An exploratory data analysis (EDA) on pH data on Lake 239 at IISD Experimental Lakes Area (IISD-ELA) is reported. R packages were applied to perform EDA.
This specific dataset I analyzed comprises the monitoring records of chemical parameters on Lake 239 at IISD Experimental Lakes Area (IISD-ELA) in 1990 - 2019. Lake 239 is located at the east of Kenora. The zoomed-in Lake 239 view is available on the IISD-ELA map.
This lake is important because it has not been manipulated in any way, so any changes we see in the lake are a product of natural variation and atmospheric change 1. The dataset is available from the repository.
Primary Sample information are listed in table 1 and table 2.
|
Information |
Description |
|---|---|
|
Latitude |
49.66103 |
|
Longitude |
-93.71315 |
|
Location ID |
239LAEIF |
|
Location Name |
East Inflow |
|
Horizontal Coordinate Reference System |
NAD83, WGS84 |
|
Information |
Description |
|---|---|
|
Location Type |
River/Stream |
|
Location |
Inflow |
|
Sample Type |
Surface Water |
|
Characteristics |
pH |
The structure of the dataset is as follows:
## 'data.frame': 5337 obs. of 36 variables:
## $ DatasetName : chr "ELA LTER Chemistry" "ELA LTER Chemistry" "ELA LTER Chemistry" "ELA LTER Chemistry" ...
## $ MonitoringLocationID : chr "239LAEIF" "239LAEIF" "239LAEIF" "239LAEIF" ...
## $ MonitoringLocationName : chr "Lake 239 East Inflow" "Lake 239 East Inflow" "Lake 239 East Inflow" "Lake 239 East Inflow" ...
## $ MonitoringLocationLatitude : num 49.7 49.7 49.7 49.7 49.7 ...
## $ MonitoringLocationLongitude : num -93.7 -93.7 -93.7 -93.7 -93.7 ...
## $ MonitoringLocationHorizontalCoordinateReferenceSystem: chr "NAD83" "NAD83" "NAD83" "NAD83" ...
## $ MonitoringLocationType : chr "River/Stream" "River/Stream" "River/Stream" "River/Stream" ...
## $ ActivityType : chr "Sample-Routine" "Sample-Routine" "Sample-Routine" "Sample-Routine" ...
## $ ActivityMediaName : chr "Surface Water" "Surface Water" "Surface Water" "Surface Water" ...
## $ ActivityStartDate : chr "1990-03-24" "1990-03-24" "1990-03-24" "1990-03-24" ...
## $ ActivityStartTime : chr "" "" "" "" ...
## $ ActivityEndDate : chr "" "" "" "" ...
## $ ActivityEndTime : chr "" "" "" "" ...
## $ ActivityDepthHeightMeasure : int 0 0 0 0 0 0 0 0 0 0 ...
## $ ActivityDepthHeightUnit : chr "m" "m" "m" "m" ...
## $ SampleCollectionEquipmentName : chr "Pump/Submersible" "Pump/Submersible" "Pump/Submersible" "Pump/Submersible" ...
## $ CharacteristicName : chr "Organic carbon" "Organic carbon" "Organic carbon" "Organic carbon" ...
## $ MethodSpeciation : chr "" "" "" "" ...
## $ ResultSampleFraction : chr "Filtered, Lab" "Filtered, Lab" "Filtered, Lab" "Filtered, Lab" ...
## $ ResultValue : num 3510 3770 3900 3540 3420 3200 3180 3360 94 88 ...
## $ ResultUnit : chr "umol/L" "umol/L" "umol/L" "umol/L" ...
## $ ResultValueType : chr "" "" "" "" ...
## $ ResultDetectionCondition : chr "" "" "" "" ...
## $ ResultDetectionQuantitationLimitMeasure : num 10 10 10 10 5 5 5 5 1 1 ...
## $ ResultDetectionQuantitationLimitUnit : chr "umol/L" "umol/L" "umol/L" "umol/L" ...
## $ ResultDetectionQuantitationLimitType : chr "Method Detection Level" "Method Detection Level" "Method Detection Level" "Method Detection Level" ...
## $ ResultStatusID : chr "Validated" "Validated" "Validated" "Validated" ...
## $ ResultComment : chr NA NA NA NA ...
## $ ResultAnalyticalMethodID : logi NA NA NA NA NA NA ...
## $ ResultAnalyticalMethodContext : chr "" "" "" "" ...
## $ ResultAnalyticalMethodName : chr "" "" "" "" ...
## $ AnalysisStartDate : chr "" "" "" "" ...
## $ AnalysisStartTime : logi NA NA NA NA NA NA ...
## $ AnalysisStartTimeZone : logi NA NA NA NA NA NA ...
## $ LaboratoryName : chr "" "" "" "" ...
## $ LaboratorySampleID : chr "K141" "K137" "K139" "K140" ...
During exploring the data set, I found out that there were 24 Lab IDs being assigned for samples on different dates.
AnalysisStartTime data in the dataset, so I don’t know
whether pH was measured at different time.| Date | Freq |
|---|---|
| 1990-03-24 | 4 |
| 1990-03-31 | 2 |
| 1990-04-01 | 2 |
| 1990-04-02 | 2 |
| 1990-04-03 | 2 |
| 1990-04-12 | 4 |
| 1990-06-02 | 3 |
| 1990-06-17 | 4 |
| 1990-06-19 | 3 |
| 1990-06-20 | 5 |
| 1990-06-21 | 3 |
| 1990-07-07 | 2 |
| 1994-04-13 | 2 |
| 2009-06-24 | 12 |
| 2009-07-01 | 4 |
| 2016-06-13 | 2 |
| 2016-08-15 | 2 |
| 2016-09-01 | 14 |
| 2016-09-07 | 7 |
| 2016-09-12 | 2 |
| 2016-09-28 | 8 |
| 2016-10-03 | 2 |
| 2016-10-17 | 2 |
| 2016-10-24 | 2 |
| 2017-10-16 | 2 |
| 2018-06-12 | 2 |
| 2018-07-10 | 2 |
| 2019-06-25 | 2 |
| 2019-07-09 | 2 |
| 2019-07-16 | 2 |
| 2019-08-13 | 2 |
| 2019-09-03 | 2 |
| 2019-09-10 | 2 |
| 2019-09-17 | 2 |
| 2019-09-24 | 2 |
| 2019-10-01 | 4 |
| 2019-10-08 | 2 |
| 2019-10-22 | 2 |
| 2019-10-29 | 2 |
| 2019-11-04 | 2 |
The sampling frequency for pH is illustrated in the following
heatmap.
Figure 1: The Number(s) of pH Records per Week
Any pH data that are placed outside of \(\scriptsize{\textrm{median} \pm 3 \times
\textrm{(median}\ \textrm{absolute}\ \textrm{deviation})}\) are
considered abnormal values.
There were the most numbers of abnormal pH values in 1990, and the
greatest ratio of abnormal pH values is found in 2007.
The multiple pH values average out on the daily base for the following plots. In the following heatmap, year 2007 stands out from other years, because pH changed from low to high when season changed.
Figure 3: pH Heatmap in 1990 - 2019
In the following scatter plot, most of pH medians fall in pH
6.0 - 6.5 over the years. Attention should be paid for year 2007,
because the median is 4.2.
Figure 4: pH Measurment with Trend
Created Date: 2022-05-18
Last Modified Date: 2022-09-27