In this assignment we will look at Water Quality Data from the City of Austin’s online data portal:https://data.austintexas.gov/Environment/Water-Quality-Sampling-Data/5tye-7ray.
The dataset contains the results of about a 1000 water quality tests performed on water bodies in Austin, in 2020.
We will use tidyverse packages to clean and study the datasets.
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
We will import the CSV directly from the City of Austin cite and study the data structure, before deciding what analysis we would like to perform on it.
water <- read_csv ('https://data.austintexas.gov/resource/5tye-7ray.csv')
glimpse(water) #studying the data structure
## Rows: 1,000
## Columns: 24
## $ watershed <chr> "Lady Bird Lake", "Lady Bird Lake", "Lady Bird Lake",…
## $ sample_date <dttm> 2020-08-18 15:10:00, 2020-08-18 15:10:00, 2020-08-18…
## $ site_name <chr> "Lagoon at Festival Beach", "Lagoon at Festival Beach…
## $ site_type <chr> "Lake", "Lake", "Lake", "Lake", "Lake", "Lake", "Lake…
## $ medium <chr> "Surface Water", "Surface Water", "Surface Water", "S…
## $ param_type <chr> "Solids/Conductivity", "Flow/Rainfall", "Alkalinity/H…
## $ parameter <chr> "CONDUCTIVITY", "DAYS AFTER STORM", "PH", "FLOW SEVER…
## $ qualifier <chr> NA, ">", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ result <dbl> 475.40, 14.00, 8.00, 3.00, 22514.00, 9.88, 1.00, 30.7…
## $ unit <chr> "uS/cm", "Days", "Standard units", "None", "None", "M…
## $ filter <chr> "Total", "Total", "Total", "Total", "Total", "Dissolv…
## $ sample_id <chr> "1997-Lagoon @ Festival beach SURF", "1997-Lagoon @ F…
## $ sample_site_no <dbl> 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1, 1,…
## $ depth_in_meters <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 7.3, 7.3, 7.3…
## $ method <chr> "HYDROLAB", "NONE", "HYDROLAB", "TCEQ FLOW SEVERITY",…
## $ qc_flag <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ project <chr> "Lady Bird Lake Harmful Algal Bloom Study", "Lady Bir…
## $ location <chr> "\n, \n(30.247941716350812, -97.72454604506493)", "\…
## $ ref_no <dbl> 2794085, 2794103, 2794066, 2794152, 2794194, 2794080,…
## $ lat_dd_wgs84 <dbl> 30.24794, 30.24794, 30.24794, 30.24794, 30.24794, 30.…
## $ lon_dd_wgs84 <dbl> -97.72455, -97.72455, -97.72455, -97.72455, -97.72455…
## $ sample_ref_no <dbl> 572564, 572564, 572564, 572564, 572564, 572564, 57257…
## $ time_null <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ qc_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
After studying the dataset, I have decided to focus my analysis on the Ph level and water temperature for these observations. Therefore, I will only keep the fields that I am interested, in for our dataset.
water <- tibble('Site_Name'=water$site_name,
'Site_Type' = water$site_type,
'Sample_Time'=water$sample_date,
'Parameter_Type' = water$param_type,
'Parameter' = water$parameter,
'Results' = water$result,
'Unit' = water$unit)
glimpse(water)
## Rows: 1,000
## Columns: 7
## $ Site_Name <chr> "Lagoon at Festival Beach", "Lagoon at Festival Beach"…
## $ Site_Type <chr> "Lake", "Lake", "Lake", "Lake", "Lake", "Lake", "Lake"…
## $ Sample_Time <dttm> 2020-08-18 15:10:00, 2020-08-18 15:10:00, 2020-08-18 …
## $ Parameter_Type <chr> "Solids/Conductivity", "Flow/Rainfall", "Alkalinity/Ha…
## $ Parameter <chr> "CONDUCTIVITY", "DAYS AFTER STORM", "PH", "FLOW SEVERI…
## $ Results <dbl> 475.40, 14.00, 8.00, 3.00, 22514.00, 9.88, 1.00, 30.77…
## $ Unit <chr> "uS/cm", "Days", "Standard units", "None", "None", "MG…
Now that we have filtered our dataset to only include the variables of interest, let’s further filter it down to observations where our parameters are PH and water temperature.
unique(water$Parameter) #looking at unique parameter values
## [1] "CONDUCTIVITY"
## [2] "DAYS AFTER STORM"
## [3] "PH"
## [4] "FLOW SEVERITY CODE (1=NONE;2=LOW;3=NORM;4=FLOOD;5=HIGH;6=DRY)"
## [5] "FIELD INSTRUMENT SERIAL NUMBER"
## [6] "DISSOLVED OXYGEN"
## [7] "CODE FOR SAMPLE COLLECTION APP"
## [8] "WATER TEMPERATURE"
## [9] "SECCHI DISK DEPTH"
## [10] "NUMBER OF AUSTIN BLIND SALAMANDERS PHOTOGRAPHED"
## [11] "TOTAL TIME SPENT"
## [12] "LIGHT INTENSITY"
## [13] "BSS SALS NOT PHOTGRAPHED >2 INCHES"
## [14] "BSS SALS NOT PHOTGRAPHED <1 INCH"
## [15] "ABS SALS NOT PHOTGRAPHED <1 INCH"
## [16] "ABS SALS NOT PHOTGRAPHED >2 INCHES"
## [17] "BSS SALS NOT PHOTGRAPHED 1-2 INCHES"
## [18] "ABS SALS NOT PHOTGRAPHED 1-2 INCHES"
## [19] "NUMBER OF BARTON SPRINGS SALAMANDERS PHOTOGRAPHED"
## [20] "FLOW"
## [21] "POECILIIDAE (GAMBUSIA)"
## [22] "BASS (MICROPTERUS)"
## [23] "CICHLIDAE"
## [24] "SUNFISH (LEPOMIS)"
## [25] "OTHER FISH"
## [26] "OXIDATION-REDUCTION_POTENTIAL"
## [27] "PLANT HEIGHT"
## [28] "RUHU RUELLIA HUMILIS"
## [29] "MEAZ MELIA AZEDARACH"
## [30] "ACOS ACALYPHA OSTRYIFOLIA"
## [31] "RUNU RUELLIA NUDIFLORA"
## [32] "MIJA MIRABILIS JALAPA"
## [33] "CAVI2 CALYPTOCARPUS VIALIS"
## [34] "BRCA6 BROMUS CATHARTICUS"
## [35] "PERCENT COVER"
## [36] "DEPA6 DESMODIUM PANICULATUM"
## [37] "ACPH3 ACALYPHA PHLEOIDES"
## [38] "CAIL2 CARYA ILLINOINENSIS"
## [39] "CANOPY COVER"
## [40] "CELA CELTIS LAEVIGATA"
## [41] "COCA COCCULUS CAROLINUS"
## [42] "ABWR ABUTILON WRIGHTII"
## [43] "OXDI2 OXALIS DILLENII"
## [44] "PAHY PARTHENIUM HYSTEROPHORUS"
## [45] "PADI3 PASPALUM DILATATUM"
## [46] "SOHA SORGHUM HALEPENSE"
## [47] "RHYNCOSIA SP"
## [48] "CHPR6 CHAMAESYCE PROSTRATA"
## [49] "RHPH2 RHYNCHOSIDA PHYSOCALYX"
## [50] "CAREX CAREX SPP."
## [51] "IPCOC2 IPOMOEA CORDATOTRILOBA VAR. CORDATOTRILOBA"
## [52] "CYES CYPERUS ESCULENTUS"
## [53] "TOAR TORILIS ARVENSIS"
## [54] "MOCI MONARDA CITRIODORA"
## [55] "EUDE4 EUPHORBIA DENTATA"
## [56] "ULCR ULMUS CRASSIFOLIA"
## [57] "AMTR AMBROSIA TRIFIDA"
## [58] "TORA2 TOXICODENDRON RADICANS"
## [59] "PAQU2 PARTHENOCISSUS QUINQUEFOLIA"
## [60] "SILA20 SIDEROXYLON LANUGINOSUM"
## [61] "VIMU2 VITIS MUSTANGENSIS"
## [62] "TRDA3 TRIPSACUM DACTYLOIDES"
## [63] "TAOFO TARAXACUM OFFICINALE"
## [64] "TRAGI TRAGIA SPP."
## [65] "ELVI3 ELYMUS VIRGINICUS"
## [66] "UNKNOWN PLANT 1"
## [67] "SMBO2 SMILAX BONA-NOX"
## [68] "VIMO2 VITIS MONTICOLA"
## [69] "RIHU2 RIVINA HUMILIS"
## [70] "ULAM ULMUS AMERICANA"
## [71] "CANOPY COVER CENTER"
## [72] "LITTER"
## [73] "NUMBER OF ROCKS SCRAPED"
## [74] "VELOCITY/DEPTH REGIMES"
## [75] "RIPARIAN VEGETATIVE ZONE WIDTH (RIGHT BANK)"
## [76] "PERCENT ALGAE COVER"
## [77] "VEGETATIVE PROTECTION (LEFT BANK)"
## [78] "BANK STABILITY (RIGHT BANK)"
## [79] "FREQUENCY OF RIFFLES"
## [80] "NUMBER OF SURBERS"
## [81] "BANK STABILITY (LEFT BANK)"
## [82] "CANOPY COVER UPSTREAM"
## [83] "CLARITY"
## [84] "SEDIMENT DEPOSITION"
## [85] "SURFACE APPEARANCE"
## [86] "VEGETATIVE PROTECTION (RIGHT BANK)"
## [87] "EMBEDDEDNESS"
## [88] "RIPARIAN VEGETATIVE ZONE WIDTH (LEFT BANK)"
## [89] "# OF GRIDS SUBSAMPLED"
## [90] "ODOR"
## [91] "CANOPY COVER DOWNSTREAM"
## [92] "EPIFAUNAL SUBSTRATE"
## [93] "CHANNEL ALTERATION"
## [94] "CHANNEL FLOW STATUS"
## [95] "SESC2 SETARIA SCHEEELEI"
## [96] "DEIL DESMANTHUS ILLINOENSIS"
## [97] "CYDA CYNODON DACTYLON"
## [98] "UNKNOWN GRASS 1"
## [99] "PHVI17 PHYSALIS VISCOSA"
## [100] "FRAXI FRAXINUS SPP."
## [101] "AREA SAMPLED"
## [102] "HEHE HEDERA HELIX"
## [103] "ARAL3 ARGEMONE ALBIFLORA"
## [104] "HEAN3 HELIANTHUS ANNUUS"
## [105] "UNKNOWN GRASS 2"
## [106] "NALE3 NASSELLA LEUCOTRICHA"
Looks like there are a lot of values stored under parameter. However I am only interested in the water Ph and temperature. I will create another tibble that is a subset of the water tibble, but only contains the observations for parameter = PH or Water Temperature.
water2 <- filter(water, water$Parameter=='PH'|water$Parameter=='WATER TEMPERATURE')
knitr:: kable (water2)
| Site_Name | Site_Type | Sample_Time | Parameter_Type | Parameter | Results | Unit |
|---|---|---|---|---|---|---|
| Lagoon at Festival Beach | Lake | 2020-08-18 15:10:00 | Alkalinity/Hardness/pH | PH | 8.00 | Standard units |
| Lagoon at Festival Beach | Lake | 2020-08-18 15:10:00 | Conventionals | WATER TEMPERATURE | 30.77 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:25:00 | Alkalinity/Hardness/pH | PH | 7.26 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:25:00 | Conventionals | WATER TEMPERATURE | 27.82 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 28.15 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 27.97 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 27.82 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.63 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.93 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 30.33 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 28.81 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 8.00 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 28.24 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.68 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.28 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.38 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.55 | Standard units |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 29.12 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Conventionals | WATER TEMPERATURE | 28.28 | Deg. Celsius |
| Lady Bird Lake @ Basin (AC) | Lake | 2020-08-18 14:15:00 | Alkalinity/Hardness/pH | PH | 7.97 | Standard units |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:45:00 | Alkalinity/Hardness/pH | PH | 7.46 | Standard units |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:45:00 | Conventionals | WATER TEMPERATURE | 27.69 | Deg. Celsius |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Conventionals | WATER TEMPERATURE | 27.84 | Deg. Celsius |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Alkalinity/Hardness/pH | PH | 7.62 | Standard units |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Conventionals | WATER TEMPERATURE | 27.71 | Deg. Celsius |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Alkalinity/Hardness/pH | PH | 7.65 | Standard units |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Alkalinity/Hardness/pH | PH | 7.67 | Standard units |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Alkalinity/Hardness/pH | PH | 7.60 | Standard units |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Conventionals | WATER TEMPERATURE | 27.77 | Deg. Celsius |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Conventionals | WATER TEMPERATURE | 27.73 | Deg. Celsius |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Conventionals | WATER TEMPERATURE | 28.01 | Deg. Celsius |
| Lady Bird Lake @ 1st St (CC) | Lake | 2020-08-18 13:35:00 | Alkalinity/Hardness/pH | PH | 7.61 | Standard units |
| Lady Bird Lake @ Shoal Creek | Lake | 2020-08-18 13:20:00 | Alkalinity/Hardness/pH | PH | 7.88 | Standard units |
| Lady Bird Lake @ Shoal Creek | Lake | 2020-08-18 13:20:00 | Conventionals | WATER TEMPERATURE | 30.17 | Deg. Celsius |
| Lady Bird Lake @ Powerline | Lake | 2020-08-18 12:55:00 | Alkalinity/Hardness/pH | PH | 7.70 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-08-18 12:55:00 | Conventionals | WATER TEMPERATURE | 28.23 | Deg. Celsius |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-08-18 12:30:00 | Alkalinity/Hardness/pH | PH | 7.63 | Standard units |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-08-18 12:30:00 | Conventionals | WATER TEMPERATURE | 26.02 | Deg. Celsius |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:50:00 | Alkalinity/Hardness/pH | PH | 7.61 | Standard units |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:50:00 | Conventionals | WATER TEMPERATURE | 27.20 | Deg. Celsius |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Conventionals | WATER TEMPERATURE | 27.46 | Deg. Celsius |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Alkalinity/Hardness/pH | PH | 7.72 | Standard units |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Alkalinity/Hardness/pH | PH | 7.71 | Standard units |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Conventionals | WATER TEMPERATURE | 27.40 | Deg. Celsius |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Alkalinity/Hardness/pH | PH | 7.77 | Standard units |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Conventionals | WATER TEMPERATURE | 27.47 | Deg. Celsius |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Alkalinity/Hardness/pH | PH | 7.75 | Standard units |
| Lady Bird Lake @ Red Bud Isle (EC) | Lake | 2020-08-18 11:40:00 | Conventionals | WATER TEMPERATURE | 27.49 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-08-18 11:20:00 | Alkalinity/Hardness/pH | PH | 7.57 | Standard units |
| Redbud West of Parking Lot | Lake | 2020-08-18 11:20:00 | Conventionals | WATER TEMPERATURE | 27.04 | Deg. Celsius |
| 6012 Florencia Lane | Spring | 2020-08-14 10:15:00 | Conventionals | WATER TEMPERATURE | 29.08 | Deg. Celsius |
| 6012 Florencia Lane | Spring | 2020-08-14 10:15:00 | Alkalinity/Hardness/pH | PH | 6.95 | Standard units |
| Barton Spring Pool @ Downstream Dam | Stream | 2020-08-12 12:10:00 | Conventionals | WATER TEMPERATURE | 22.17 | Deg. Celsius |
| Barton Spring Pool @ Downstream Dam | Stream | 2020-08-12 12:10:00 | Alkalinity/Hardness/pH | PH | 7.04 | Standard units |
| Barton Spring | Spring | 2020-08-12 12:05:00 | Alkalinity/Hardness/pH | PH | 6.97 | Standard units |
| Barton Spring | Spring | 2020-08-12 12:05:00 | Conventionals | WATER TEMPERATURE | 21.61 | Deg. Celsius |
| Eliza Spring | Spring | 2020-08-12 08:37:00 | Conventionals | WATER TEMPERATURE | 21.56 | Deg. Celsius |
| Eliza Spring | Spring | 2020-08-12 08:37:00 | Alkalinity/Hardness/pH | PH | 7.18 | Standard units |
| Lagoon at Festival Beach | Lake | 2020-08-11 12:40:00 | Conventionals | WATER TEMPERATURE | 29.07 | Deg. Celsius |
| Lagoon at Festival Beach | Lake | 2020-08-11 12:40:00 | Alkalinity/Hardness/pH | PH | 7.69 | Standard units |
| Redbud West of Parking Lot | Lake | 2020-08-11 11:55:00 | Conventionals | WATER TEMPERATURE | 27.74 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-08-11 11:55:00 | Alkalinity/Hardness/pH | PH | 7.58 | Standard units |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-08-11 11:10:00 | Conventionals | WATER TEMPERATURE | 23.07 | Deg. Celsius |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-08-11 11:10:00 | Alkalinity/Hardness/pH | PH | 7.39 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-08-11 10:40:00 | Alkalinity/Hardness/pH | PH | 7.71 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-08-11 10:40:00 | Conventionals | WATER TEMPERATURE | 27.45 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-08-04 12:40:00 | Conventionals | WATER TEMPERATURE | 27.20 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-08-04 12:40:00 | Alkalinity/Hardness/pH | PH | 7.66 | Standard units |
| Lagoon at Festival Beach | Lake | 2020-08-04 12:05:00 | Alkalinity/Hardness/pH | PH | 8.09 | Standard units |
| Lagoon at Festival Beach | Lake | 2020-08-04 12:05:00 | Conventionals | WATER TEMPERATURE | 30.27 | Deg. Celsius |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-08-04 10:55:00 | Conventionals | WATER TEMPERATURE | 23.26 | Deg. Celsius |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-08-04 10:55:00 | Alkalinity/Hardness/pH | PH | 7.09 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-08-04 10:30:00 | Alkalinity/Hardness/pH | PH | 7.54 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-08-04 10:30:00 | Conventionals | WATER TEMPERATURE | 28.06 | Deg. Celsius |
| Lagoon at Festival Beach | Lake | 2020-07-29 11:35:00 | Alkalinity/Hardness/pH | PH | 7.70 | Standard units |
| Lagoon at Festival Beach | Lake | 2020-07-29 11:35:00 | Conventionals | WATER TEMPERATURE | 28.65 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-07-29 10:55:00 | Conventionals | WATER TEMPERATURE | 26.21 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-07-29 10:55:00 | Alkalinity/Hardness/pH | PH | 7.22 | Standard units |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-07-29 10:20:00 | Conventionals | WATER TEMPERATURE | 21.85 | Deg. Celsius |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-07-29 10:20:00 | Alkalinity/Hardness/pH | PH | 6.94 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-07-29 09:45:00 | Conventionals | WATER TEMPERATURE | 27.13 | Deg. Celsius |
| Lady Bird Lake @ Powerline | Lake | 2020-07-29 09:45:00 | Alkalinity/Hardness/pH | PH | 7.60 | Standard units |
| Waller Creek Downstream of Cesar Chavez | Stream | 2020-07-22 14:25:00 | Alkalinity/Hardness/pH | PH | 8.34 | Standard units |
| Waller Creek Downstream of Cesar Chavez | Stream | 2020-07-22 14:25:00 | Conventionals | WATER TEMPERATURE | 28.99 | Deg. Celsius |
| Waller Creek Upstream of 23rd Street | Stream | 2020-07-22 13:50:00 | Alkalinity/Hardness/pH | PH | 7.97 | Standard units |
| Waller Creek Upstream of 23rd Street | Stream | 2020-07-22 13:50:00 | Conventionals | WATER TEMPERATURE | 27.18 | Deg. Celsius |
| Waller Creek @ Shipe Park | Stream | 2020-07-22 13:25:00 | Conventionals | WATER TEMPERATURE | 28.06 | Deg. Celsius |
| Waller Creek @ Shipe Park | Stream | 2020-07-22 13:25:00 | Alkalinity/Hardness/pH | PH | 7.92 | Standard units |
| Spicewood Tributary Downstream of Spicewood Spring | Stream | 2020-07-22 12:00:00 | Alkalinity/Hardness/pH | PH | 7.36 | Standard units |
| Spicewood Tributary Downstream of Spicewood Spring | Stream | 2020-07-22 12:00:00 | Conventionals | WATER TEMPERATURE | 25.09 | Deg. Celsius |
| Lagoon at Festival Beach | Lake | 2020-07-22 11:55:00 | Conventionals | WATER TEMPERATURE | 29.99 | Deg. Celsius |
| Lagoon at Festival Beach | Lake | 2020-07-22 11:55:00 | Alkalinity/Hardness/pH | PH | 7.98 | Standard units |
| Taylor Slough South @ Reed Park (TSS) | Stream | 2020-07-22 11:35:00 | Alkalinity/Hardness/pH | PH | 8.08 | Standard units |
| Taylor Slough South @ Reed Park (TSS) | Stream | 2020-07-22 11:35:00 | Conventionals | WATER TEMPERATURE | 25.83 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-07-22 11:05:00 | Conventionals | WATER TEMPERATURE | 27.52 | Deg. Celsius |
| Redbud West of Parking Lot | Lake | 2020-07-22 11:05:00 | Alkalinity/Hardness/pH | PH | 7.19 | Standard units |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-07-22 10:35:00 | Conventionals | WATER TEMPERATURE | 22.02 | Deg. Celsius |
| Barton Creek Mouth upstream @ Lady Bird Lake | Stream | 2020-07-22 10:35:00 | Alkalinity/Hardness/pH | PH | 7.01 | Standard units |
| Lady Bird Lake @ Powerline | Lake | 2020-07-22 10:00:00 | Conventionals | WATER TEMPERATURE | 27.47 | Deg. Celsius |
| Lady Bird Lake @ Powerline | Lake | 2020-07-22 10:00:00 | Alkalinity/Hardness/pH | PH | 7.55 | Standard units |
| Onion Creek @ South Austin Regional WWTP (SAR) | Stream | 2020-07-16 09:55:00 | Conventionals | WATER TEMPERATURE | 29.46 | Deg. Celsius |
| Onion Creek @ South Austin Regional WWTP (SAR) | Stream | 2020-07-16 09:55:00 | Alkalinity/Hardness/pH | PH | 7.70 | Standard units |
| Lagoon at Festival Beach | Lake | 2020-07-14 12:20:00 | Conventionals | WATER TEMPERATURE | 29.86 | Deg. Celsius |
| Lagoon at Festival Beach | Lake | 2020-07-14 12:20:00 | Alkalinity/Hardness/pH | PH | 6.99 | Standard units |
Blank or missing values can skew or results quite a bit. Therefore, we will get rid of any blank or missing values.
na.omit(water2$Results)
## [1] 8.00 30.77 7.26 27.82 28.15 27.97 27.82 7.63 7.93 30.33 28.81 8.00
## [13] 28.24 7.68 7.28 7.38 7.55 29.12 28.28 7.97 7.46 27.69 27.84 7.62
## [25] 27.71 7.65 7.67 7.60 27.77 27.73 28.01 7.61 7.88 30.17 7.70 28.23
## [37] 7.63 26.02 7.61 27.20 27.46 7.72 7.71 27.40 7.77 27.47 7.75 27.49
## [49] 7.57 27.04 29.08 6.95 22.17 7.04 6.97 21.61 21.56 7.18 29.07 7.69
## [61] 27.74 7.58 23.07 7.39 7.71 27.45 27.20 7.66 8.09 30.27 23.26 7.09
## [73] 7.54 28.06 7.70 28.65 26.21 7.22 21.85 6.94 27.13 7.60 8.34 28.99
## [85] 7.97 27.18 28.06 7.92 7.36 25.09 29.99 7.98 8.08 25.83 27.52 7.19
## [97] 22.02 7.01 27.47 7.55 29.46 7.70 29.86 6.99
First, we will drop the columns we do not need anymore such as unit and parameter type as we know what the corresponding values are for water pH and water temperature. We will overwrite our water2 tibble with a copy of itself excluding the unit and parameter type.
water2 <- water2[,-c(4,7)] #corresponsing column numbers for parameter type and unit
water2
Next, we will work on putting the water temperature and PH that were taken at the same time and at the same location, in a single row, because they are essentially from the same observation, but just different variables. We will use tidyverse’s spread function.
This is returning an error, for some row numbers. Let’s investigate what the issue is for these rows. We will look at the first five row numbers, specified in the error message.
water2[c(23,27,28,31,32),]
It looks like there are multiple PH values for an observation taken at the same time (18th August, 2020) at 13:35, at Lady Bird Lake. So there are duplicate measurements in our dataset. We do not have enough information to determine why this is the case, so we will just work towards removing the duplicate values using the duplicated function.
duplicate <- water2[,-5]#removing the 5th column because this is Results column which does not contain duplicates.
duplicate2 <- which(duplicated(duplicate)) #row numbers of values which are duplicates of earlier observations
duplicate2
## [1] 6 7 9 10 11 12 13 14 15 16 17 18 19 20 25 26 27 28 29 30 31 32 43 44 45
## [26] 46 47 48
There seems to be quite a few duplicate observations in our dataset. We will filter them out from our water2 tibble and try the spread again.
water2 <- water2[-duplicate2,]
water2_wide <- spread(water2,Parameter,Results)
water2_wide
Looks like our spread worked this time.As a final clean up, I’d like to change the column name for water temperature to Water_Temperature and PH to pH, using the colnames function.
colnames(water2_wide)[4] <- 'pH'
colnames(water2_wide)[5] <- 'Water_Temperature'
water2_wide
Now that we have a cleaned dataset, let’s look at the temperature and pH statistics:
pH BoxPlot
boxplot(water2_wide$pH)
It looks like the ph Levels more or less ranged from around 7-8.2 wuth an average of ~7.6. for all sites.
Water Temperature Histogram
ggplot(data=water2_wide,aes(x=Water_Temperature,fill=Site_Type))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
For water temperature, it looks like the range for lakes maybe slightly less varied (temperature points seem fairly close together) than streams and springs.
Let’s try to do a scatterplot for pH and Water Temperature to gauge if there maybe a correlation between the 2, for each site type. We will also use a fit line to help us detect any assocations.
ggplot(water2_wide,
aes(pH,Water_Temperature, color = Site_Type))+
geom_point()+
geom_smooth(method = lm)
## `geom_smooth()` using formula 'y ~ x'
It appears that there is almost no correlation between the two, atleast for lake and streams (the lines appear almost straight,maybe sligtly positive for streams). There maybe a slightly negative correlation between the two for springs.