In order to analyse the data, it must first be successfully read into R. As there are two file formats to consider, the file type of interest needs to be identified. For Dunmore East, the .csv file extension allows for the read.csv() function to be used.
library(maps)
As the library ‘VulnToolkit’ was not found upon first attempt, Github code was obtained from the following url: https://github.com/troyhill/VulnToolkit and used to install the libraries to be loaded.
library(maps)
Fix this later.
Dunmore_East_TG = read.csv(file.path(getwd(),'dunmore_east.csv'), skip = 1, header=TRUE)
Using the skip() function, the sub-header row can be omitted as we are aware of the units of measurement for each column from the header row.
head(Dunmore_East_TG)
## degrees_east degrees_north UTC X
## 1 -6.99188 52.14767 2012-04-23T09:06:00Z Dunmore East Harbour
## 2 -6.99188 52.14767 2012-04-23T09:12:00Z Dunmore East Harbour
## 3 -6.99188 52.14767 2012-04-23T09:18:00Z Dunmore East Harbour
## 4 -6.99188 52.14767 2012-04-23T09:24:00Z Dunmore East Harbour
## 5 -6.99188 52.14767 2012-04-23T09:30:00Z Dunmore East Harbour
## 6 -6.99188 52.14767 2012-04-23T09:36:00Z Dunmore East Harbour
## m X.1
## 1 0.128 1
## 2 0.033 1
## 3 -0.052 1
## 4 -0.145 1
## 5 -0.229 1
## 6 -0.323 1
Some of the headers aren’t immediately sensible. As such, the names() function will be used to rename the headers identified using the head() function.
names(Dunmore_East_TG)<-c("lon","lat","time","name","m_cor","flag")
View(Dunmore_East_TG)
Now that the csv headers are appropriately named, the last bit of data cleaning involves removing the duplicate lat and lon values that are not needed (given that one is enough for a stationary tide gauge).
DunEast.lat <- Dunmore_East_TG$lat[1]
DunEast.lon <- Dunmore_East_TG$lon[1]
Now the latitude and longitude are recorded once rather than each time.
How would you look at the file? how do you know its structure? outside of R?
Considering the downloaded file is in csv format, you could use Excel to open it and investigate the headers, data type/formatting, as well as the headers. The dimensions can also be assessed in Excel too (e.g number of rows). You could tel its structure, i.e. whether it is delimited by spaces or commas within certain columns rather than between different columns. For example, the date column would have dashes as delimiters.
map("world",c("ireland","uk"),fill=TRUE,xlim=c(-12,-4),ylim=c(51,56))
map.axes(cex.axis=1)
title(main="Location of Dunmore East Tide Gauge",xlab="Longitude",ylab="Latitude")
points(DunEast.lon,DunEast.lat,pch=21,col="gray",bg="red")
text(DunEast.lon-.5,DunEast.lat,"Dunmore East",col="gray", cex = 0.6)
Using the code from the GY667 Lab 2 skeleton script, the Dunmore East tide gauge can be plotted on a map of Ireland. To check this is correct, a Google maps search of the location was conducted.
map("world",c("ireland","uk"),fill=TRUE,xlim=c(-8,-6),ylim=c(51.5,52.5))
map.axes(cex.axis=1)
title(main="Location of Dunmore East Tide Gauge",xlab="Longitude",ylab="Latitude")
points(DunEast.lon,DunEast.lat,pch=21,col="gray",bg="red")
text(DunEast.lon-.5,DunEast.lat,"Dunmore East",col="gray", cex = 0.6)
For the purposes of this lab, the Workshop 2 instructions file which accompanies these tasks specifies our interest in three variables: sea level, time, and quality flag.
head(Dunmore_East_TG$time)
## [1] 2012-04-23T09:06:00Z 2012-04-23T09:12:00Z 2012-04-23T09:18:00Z
## [4] 2012-04-23T09:24:00Z 2012-04-23T09:30:00Z 2012-04-23T09:36:00Z
## 532706 Levels: 2012-04-23T09:06:00Z ... 2019-02-20T15:50:00Z
From the head() function, the time column contains date and time as a text string that R will not be able to discern without specifying the time zone as well as format for the time string. As such, below is the code that was executed in order to render the time string readable.
rtime<-as.POSIXlt(Dunmore_East_TG$time,format="%Y-%m-%dT%H:%M:%SZ",tz='UTC')
Now the text string is readable as a time string in R.
DunEast <- data.frame("time"=rtime,
"year"=rtime$year+1900,
"month"=rtime$mon+1,
"day"=rtime$mday,
"hour"=rtime$hour+1,
"min"=rtime$min,
"sec"=rtime$sec,
"h"=Dunmore_East_TG$m_cor,
"flag"=Dunmore_East_TG$flag)
Above, the dataframe ‘DunEast’ was created with disaggregated time fields. However, to make the tide gauge data more accessible, an hourly average dataframe will need to be created.
head(DunEast,n=10)
## time year month day hour min sec h flag
## 1 2012-04-23 09:06:00 2012 4 23 10 6 0 0.128 1
## 2 2012-04-23 09:12:00 2012 4 23 10 12 0 0.033 1
## 3 2012-04-23 09:18:00 2012 4 23 10 18 0 -0.052 1
## 4 2012-04-23 09:24:00 2012 4 23 10 24 0 -0.145 1
## 5 2012-04-23 09:30:00 2012 4 23 10 30 0 -0.229 1
## 6 2012-04-23 09:36:00 2012 4 23 10 36 0 -0.323 1
## 7 2012-04-23 09:42:00 2012 4 23 10 42 0 -0.376 1
## 8 2012-04-23 09:48:00 2012 4 23 10 48 0 -0.457 1
## 9 2012-04-23 09:54:00 2012 4 23 10 54 0 -0.536 1
## 10 2012-04-23 10:00:00 2012 4 23 11 0 0 -0.612 1
tail(DunEast,n=11)
## time year month day hour min sec h flag
## 532696 2019-02-20 14:55:00 2019 2 20 15 55 0 -0.396 0
## 532697 2019-02-20 15:00:00 2019 2 20 16 0 0 -0.330 0
## 532698 2019-02-20 15:05:00 2019 2 20 16 5 0 -0.153 0
## 532699 2019-02-20 15:10:00 2019 2 20 16 10 0 -0.124 0
## 532700 2019-02-20 15:15:00 2019 2 20 16 15 0 -0.040 0
## 532701 2019-02-20 15:20:00 2019 2 20 16 20 0 -0.039 0
## 532702 2019-02-20 15:25:00 2019 2 20 16 25 0 0.179 0
## 532703 2019-02-20 15:30:00 2019 2 20 16 30 0 0.179 0
## 532704 2019-02-20 15:35:00 2019 2 20 16 35 0 0.339 0
## 532705 2019-02-20 15:45:00 2019 2 20 16 45 0 0.433 0
## 532706 2019-02-20 15:50:00 2019 2 20 16 50 0 0.547 0
Using the head() and tail() functions, the row ‘10’ was idenitifed as the first which started on an hour. The row 532696 was the last row before a new hour started. As such, these could inform the creation of a subset of the data.
DunEast.sub <- DunEast[10:532696,]
DunEast.hour <- aggregate(h~hour+day+month+year,DunEast.sub,mean)
Above, the creation of the subset of tide gauge hourly data was
DunEast.hour <- cbind(DunEast.hour, NA)
names(DunEast.hour) <- c("hour","day","month", "year", "h", "time")
Above, creating a new column for a decimal year was performed using the cbind function. A new column which is blank for now is included.
DunEast.hour$time <- as.POSIXlt(sprintf("%s/%s/%s %s",
DunEast.hour$year, DunEast.hour$month,
DunEast.hour$day, DunEast.hour$hour),
format="%Y/%m/%d %H",tz='UTC')
Now that the decimal year column is filled, the beginning and end of the datasets can be analysed.
head(DunEast.hour)
## hour day month year h time
## 1 11 23 4 2012 -0.9142 2012-04-23 11:00:00
## 2 12 23 4 2012 -1.4520 2012-04-23 12:00:00
## 3 13 23 4 2012 -1.8022 2012-04-23 13:00:00
## 4 14 23 4 2012 -1.7315 2012-04-23 14:00:00
## 5 15 23 4 2012 -1.1641 2012-04-23 15:00:00
## 6 16 23 4 2012 -0.3235 2012-04-23 16:00:00
tail(DunEast.hour)
## hour day month year h time
## 52845 10 20 2 2019 -0.3570833 2019-02-20 10:00:00
## 52846 11 20 2 2019 -1.1265000 2019-02-20 11:00:00
## 52847 12 20 2 2019 -1.7416667 2019-02-20 12:00:00
## 52848 13 20 2 2019 -2.0347500 2019-02-20 13:00:00
## 52849 14 20 2 2019 -1.7061667 2019-02-20 14:00:00
## 52850 15 20 2 2019 -0.8767500 2019-02-20 15:00:00
Similarly to aggregating by hour, the uneven start and end dates will be discounted and an even start and finish was specified.
DunEast.month <- DunEast[1830:527045,]
DunEast.month <- aggregate(h~month+year,DunEast.month,mean)
From the 1st hour on the 01/05/
DunEast.month <- cbind(DunEast.month, NA)
names(DunEast.month) <- c("month", "year", "h", "time")
DunEast.month$time <- as.POSIXlt(sprintf("%s/%s/15", DunEast.month$year, DunEast.month$month, format="%Y/%m/%d"), tz='UTC')
plot(DunEast.month$time,DunEast.month$h,
type = "b",
col = "red",
lwd = 2,
pch = 3,
xlab = "Year",
ylab = "Sea level [m OD Malin]",
main= "Dunmore East Monthly")
DunEast.month[,5:6] = NULL
write.csv(DunEast.month, file = "DunmoreEastMonthlySL_rev01.csv", sep = ',')
## Warning in write.csv(DunEast.month, file =
## "DunmoreEastMonthlySL_rev01.csv", : attempt to set 'sep' ignored
End of workshop.