GY667 Lab 2 - Analysing Dunmore East Tide Gauge Data

Task 1 - Loading the libraries and datasets

In order to analyse the data, it must first be successfully read into R. As there are two file formats to consider, the file type of interest needs to be identified. For Dunmore East, the .csv file extension allows for the read.csv() function to be used.

library(maps)

As the library ‘VulnToolkit’ was not found upon first attempt, Github code was obtained from the following url: https://github.com/troyhill/VulnToolkit and used to install the libraries to be loaded.

library(maps)

Fix this later.

Dunmore_East_TG = read.csv(file.path(getwd(),'dunmore_east.csv'), skip = 1, header=TRUE)

Using the skip() function, the sub-header row can be omitted as we are aware of the units of measurement for each column from the header row.

head(Dunmore_East_TG)
##   degrees_east degrees_north                  UTC                    X
## 1     -6.99188      52.14767 2012-04-23T09:06:00Z Dunmore East Harbour
## 2     -6.99188      52.14767 2012-04-23T09:12:00Z Dunmore East Harbour
## 3     -6.99188      52.14767 2012-04-23T09:18:00Z Dunmore East Harbour
## 4     -6.99188      52.14767 2012-04-23T09:24:00Z Dunmore East Harbour
## 5     -6.99188      52.14767 2012-04-23T09:30:00Z Dunmore East Harbour
## 6     -6.99188      52.14767 2012-04-23T09:36:00Z Dunmore East Harbour
##        m X.1
## 1  0.128   1
## 2  0.033   1
## 3 -0.052   1
## 4 -0.145   1
## 5 -0.229   1
## 6 -0.323   1

Some of the headers aren’t immediately sensible. As such, the names() function will be used to rename the headers identified using the head() function.

names(Dunmore_East_TG)<-c("lon","lat","time","name","m_cor","flag")
View(Dunmore_East_TG)

Now that the csv headers are appropriately named, the last bit of data cleaning involves removing the duplicate lat and lon values that are not needed (given that one is enough for a stationary tide gauge).

DunEast.lat <- Dunmore_East_TG$lat[1]
DunEast.lon <- Dunmore_East_TG$lon[1]

Now the latitude and longitude are recorded once rather than each time.

Questions from Task 1

How would you look at the file? how do you know its structure? outside of R?

Considering the downloaded file is in csv format, you could use Excel to open it and investigate the headers, data type/formatting, as well as the headers. The dimensions can also be assessed in Excel too (e.g number of rows). You could tel its structure, i.e. whether it is delimited by spaces or commas within certain columns rather than between different columns. For example, the date column would have dashes as delimiters.

Task 2: Plot the location of your tide gauge:

map("world",c("ireland","uk"),fill=TRUE,xlim=c(-12,-4),ylim=c(51,56))
map.axes(cex.axis=1)
title(main="Location of Dunmore East Tide Gauge",xlab="Longitude",ylab="Latitude")
points(DunEast.lon,DunEast.lat,pch=21,col="gray",bg="red")
text(DunEast.lon-.5,DunEast.lat,"Dunmore East",col="gray", cex = 0.6)

Using the code from the GY667 Lab 2 skeleton script, the Dunmore East tide gauge can be plotted on a map of Ireland. To check this is correct, a Google maps search of the location was conducted.

map("world",c("ireland","uk"),fill=TRUE,xlim=c(-8,-6),ylim=c(51.5,52.5))
map.axes(cex.axis=1)
title(main="Location of Dunmore East Tide Gauge",xlab="Longitude",ylab="Latitude")
points(DunEast.lon,DunEast.lat,pch=21,col="gray",bg="red")
text(DunEast.lon-.5,DunEast.lat,"Dunmore East",col="gray", cex = 0.6)

Task 3: Convert your data to POSIXlt date format.

For the purposes of this lab, the Workshop 2 instructions file which accompanies these tasks specifies our interest in three variables: sea level, time, and quality flag.

head(Dunmore_East_TG$time)
## [1] 2012-04-23T09:06:00Z 2012-04-23T09:12:00Z 2012-04-23T09:18:00Z
## [4] 2012-04-23T09:24:00Z 2012-04-23T09:30:00Z 2012-04-23T09:36:00Z
## 532706 Levels: 2012-04-23T09:06:00Z ... 2019-02-20T15:50:00Z

From the head() function, the time column contains date and time as a text string that R will not be able to discern without specifying the time zone as well as format for the time string. As such, below is the code that was executed in order to render the time string readable.

rtime<-as.POSIXlt(Dunmore_East_TG$time,format="%Y-%m-%dT%H:%M:%SZ",tz='UTC')

Now the text string is readable as a time string in R.

DunEast <- data.frame("time"=rtime,
                  "year"=rtime$year+1900,
                  "month"=rtime$mon+1,
                  "day"=rtime$mday,
                  "hour"=rtime$hour+1,
                  "min"=rtime$min,
                  "sec"=rtime$sec,
                  "h"=Dunmore_East_TG$m_cor,
                  "flag"=Dunmore_East_TG$flag)

Above, the dataframe ‘DunEast’ was created with disaggregated time fields. However, to make the tide gauge data more accessible, an hourly average dataframe will need to be created.

head(DunEast,n=10)
##                   time year month day hour min sec      h flag
## 1  2012-04-23 09:06:00 2012     4  23   10   6   0  0.128    1
## 2  2012-04-23 09:12:00 2012     4  23   10  12   0  0.033    1
## 3  2012-04-23 09:18:00 2012     4  23   10  18   0 -0.052    1
## 4  2012-04-23 09:24:00 2012     4  23   10  24   0 -0.145    1
## 5  2012-04-23 09:30:00 2012     4  23   10  30   0 -0.229    1
## 6  2012-04-23 09:36:00 2012     4  23   10  36   0 -0.323    1
## 7  2012-04-23 09:42:00 2012     4  23   10  42   0 -0.376    1
## 8  2012-04-23 09:48:00 2012     4  23   10  48   0 -0.457    1
## 9  2012-04-23 09:54:00 2012     4  23   10  54   0 -0.536    1
## 10 2012-04-23 10:00:00 2012     4  23   11   0   0 -0.612    1
tail(DunEast,n=11) 
##                       time year month day hour min sec      h flag
## 532696 2019-02-20 14:55:00 2019     2  20   15  55   0 -0.396    0
## 532697 2019-02-20 15:00:00 2019     2  20   16   0   0 -0.330    0
## 532698 2019-02-20 15:05:00 2019     2  20   16   5   0 -0.153    0
## 532699 2019-02-20 15:10:00 2019     2  20   16  10   0 -0.124    0
## 532700 2019-02-20 15:15:00 2019     2  20   16  15   0 -0.040    0
## 532701 2019-02-20 15:20:00 2019     2  20   16  20   0 -0.039    0
## 532702 2019-02-20 15:25:00 2019     2  20   16  25   0  0.179    0
## 532703 2019-02-20 15:30:00 2019     2  20   16  30   0  0.179    0
## 532704 2019-02-20 15:35:00 2019     2  20   16  35   0  0.339    0
## 532705 2019-02-20 15:45:00 2019     2  20   16  45   0  0.433    0
## 532706 2019-02-20 15:50:00 2019     2  20   16  50   0  0.547    0

Using the head() and tail() functions, the row ‘10’ was idenitifed as the first which started on an hour. The row 532696 was the last row before a new hour started. As such, these could inform the creation of a subset of the data.

DunEast.sub <- DunEast[10:532696,]

DunEast.hour <- aggregate(h~hour+day+month+year,DunEast.sub,mean)

Above, the creation of the subset of tide gauge hourly data was

DunEast.hour <- cbind(DunEast.hour, NA)
names(DunEast.hour) <- c("hour","day","month", "year", "h", "time")

Above, creating a new column for a decimal year was performed using the cbind function. A new column which is blank for now is included.

DunEast.hour$time <- as.POSIXlt(sprintf("%s/%s/%s %s", 
                                    DunEast.hour$year, DunEast.hour$month, 
                                    DunEast.hour$day, DunEast.hour$hour),
                            format="%Y/%m/%d %H",tz='UTC')

Now that the decimal year column is filled, the beginning and end of the datasets can be analysed.

head(DunEast.hour)
##   hour day month year       h                time
## 1   11  23     4 2012 -0.9142 2012-04-23 11:00:00
## 2   12  23     4 2012 -1.4520 2012-04-23 12:00:00
## 3   13  23     4 2012 -1.8022 2012-04-23 13:00:00
## 4   14  23     4 2012 -1.7315 2012-04-23 14:00:00
## 5   15  23     4 2012 -1.1641 2012-04-23 15:00:00
## 6   16  23     4 2012 -0.3235 2012-04-23 16:00:00
tail(DunEast.hour)
##       hour day month year          h                time
## 52845   10  20     2 2019 -0.3570833 2019-02-20 10:00:00
## 52846   11  20     2 2019 -1.1265000 2019-02-20 11:00:00
## 52847   12  20     2 2019 -1.7416667 2019-02-20 12:00:00
## 52848   13  20     2 2019 -2.0347500 2019-02-20 13:00:00
## 52849   14  20     2 2019 -1.7061667 2019-02-20 14:00:00
## 52850   15  20     2 2019 -0.8767500 2019-02-20 15:00:00

Task 4: create monthly averages.

Similarly to aggregating by hour, the uneven start and end dates will be discounted and an even start and finish was specified.

DunEast.month <- DunEast[1830:527045,]

DunEast.month <- aggregate(h~month+year,DunEast.month,mean)

From the 1st hour on the 01/05/

DunEast.month <- cbind(DunEast.month, NA)
names(DunEast.month) <- c("month", "year", "h", "time")

DunEast.month$time <- as.POSIXlt(sprintf("%s/%s/15", DunEast.month$year, DunEast.month$month, format="%Y/%m/%d"), tz='UTC')

Task 5: Plot your monthly timeseries of sea level.

plot(DunEast.month$time,DunEast.month$h,
      type = "b",
      col = "red",
      lwd = 2,
      pch = 3,
      xlab = "Year",
      ylab = "Sea level [m OD Malin]",
      main= "Dunmore East Monthly")

DunEast.month[,5:6] = NULL

Task 6: Save your data.

write.csv(DunEast.month, file = "DunmoreEastMonthlySL_rev01.csv", sep = ',')
## Warning in write.csv(DunEast.month, file =
## "DunmoreEastMonthlySL_rev01.csv", : attempt to set 'sep' ignored

End of workshop.