In the scope of this paper two two methods of finding a daily mean temperature of 24 hourly measurements from 01.03.2005 till 31.12.2016 were made. The objective of a such a comparison is to detect any errors or `bias in order to have the most possible precise mean temperature value. The first, which was practiced before 2001, method uses only four measurements in calculating mean value. These measurements are taken at 7am, 2 pm and twice at 9 pm. The second, more classical, method takes all hourly measurements of each day, sums them up for each day and divides by 24. Two plots of the mean value depending on date values are depicted.
The daily mean temperature of 24 hourly measurements of station Ammerang-Pfaffing available at (ftp://ftp-cdc.dwd.de//pub/CDC/observations_germany/climate/hourly/air_temperature/historical/stundenwerte_TU_00154_20050301_20161231_hist.zip) were downloaded from [German National Meteorological Service]. After setting the working directory and reading the .txt data, a separator was defined as ; ,unnecessary columns such as station number “id”, quality number of next column “qn9”, relative humidity “rf_tu”,end of data record “eor” were hidden and a column "MESS_DATUM" was renamed into "datetime". To avoid long list of the data, head and tail functions are applied to show only first and last 26 rows. In order to sort out by hours datetime value is put into new format where time will be shown separately as a result of lapply function.
filename <- "produkt_tu_stunde_20050301_20161231_00154.txt"
data = read.table(filename,
sep=";",
col.names=c("id", "datetime", "qn9", "tt_tu", "rf_tu", "eor"),
header = TRUE,
strip.white=TRUE,
)[c("datetime", "tt_tu")]
head(data,n=26);tail(data,n=26)
## datetime tt_tu
## 1 2005030100 -17.1
## 2 2005030101 -16.9
## 3 2005030102 -17.7
## 4 2005030103 -17.1
## 5 2005030104 -18.5
## 6 2005030105 -19.8
## 7 2005030106 -19.5
## 8 2005030107 -16.8
## 9 2005030108 -15.5
## 10 2005030109 -12.3
## 11 2005030110 -8.9
## 12 2005030111 -7.2
## 13 2005030112 -6.6
## 14 2005030113 -5.5
## 15 2005030114 -4.5
## 16 2005030115 -2.3
## 17 2005030116 -6.3
## 18 2005030117 -9.5
## 19 2005030118 -12.9
## 20 2005030119 -13.4
## 21 2005030120 -15.1
## 22 2005030121 -15.6
## 23 2005030122 -16.1
## 24 2005030123 -18.4
## 25 2005030200 -19.0
## 26 2005030201 -18.5
## datetime tt_tu
## 103570 2016123022 -4.5
## 103571 2016123023 -5.5
## 103572 2016123100 -6.1
## 103573 2016123101 -6.8
## 103574 2016123102 -6.7
## 103575 2016123103 -6.9
## 103576 2016123104 -7.3
## 103577 2016123105 -7.6
## 103578 2016123106 -7.9
## 103579 2016123107 -7.7
## 103580 2016123108 -7.0
## 103581 2016123109 -4.6
## 103582 2016123110 -3.0
## 103583 2016123111 -0.9
## 103584 2016123112 1.8
## 103585 2016123113 2.3
## 103586 2016123114 3.0
## 103587 2016123115 0.9
## 103588 2016123116 -1.7
## 103589 2016123117 -2.7
## 103590 2016123118 -3.3
## 103591 2016123119 -3.3
## 103592 2016123120 -3.9
## 103593 2016123121 -5.3
## 103594 2016123122 -6.2
## 103595 2016123123 -6.3
datetimes = lapply(data["datetime"], function(x) as.POSIXlt(as.character(x), format="%Y%m%d%H"))
data["hour"] = lapply(datetimes, function(x) x$hour)
data["date"] = lapply(datetimes, function(x) as.Date(x))
data = data[c("date", "hour", "tt_tu")]
head(data,n=26);tail(data,n=26)
## date hour tt_tu
## 1 2005-03-01 0 -17.1
## 2 2005-03-01 1 -16.9
## 3 2005-03-01 2 -17.7
## 4 2005-03-01 3 -17.1
## 5 2005-03-01 4 -18.5
## 6 2005-03-01 5 -19.8
## 7 2005-03-01 6 -19.5
## 8 2005-03-01 7 -16.8
## 9 2005-03-01 8 -15.5
## 10 2005-03-01 9 -12.3
## 11 2005-03-01 10 -8.9
## 12 2005-03-01 11 -7.2
## 13 2005-03-01 12 -6.6
## 14 2005-03-01 13 -5.5
## 15 2005-03-01 14 -4.5
## 16 2005-03-01 15 -2.3
## 17 2005-03-01 16 -6.3
## 18 2005-03-01 17 -9.5
## 19 2005-03-01 18 -12.9
## 20 2005-03-01 19 -13.4
## 21 2005-03-01 20 -15.1
## 22 2005-03-01 21 -15.6
## 23 2005-03-01 22 -16.1
## 24 2005-03-01 23 -18.4
## 25 2005-03-02 0 -19.0
## 26 2005-03-02 1 -18.5
## date hour tt_tu
## 103570 2016-12-30 22 -4.5
## 103571 2016-12-30 23 -5.5
## 103572 2016-12-31 0 -6.1
## 103573 2016-12-31 1 -6.8
## 103574 2016-12-31 2 -6.7
## 103575 2016-12-31 3 -6.9
## 103576 2016-12-31 4 -7.3
## 103577 2016-12-31 5 -7.6
## 103578 2016-12-31 6 -7.9
## 103579 2016-12-31 7 -7.7
## 103580 2016-12-31 8 -7.0
## 103581 2016-12-31 9 -4.6
## 103582 2016-12-31 10 -3.0
## 103583 2016-12-31 11 -0.9
## 103584 2016-12-31 12 1.8
## 103585 2016-12-31 13 2.3
## 103586 2016-12-31 14 3.0
## 103587 2016-12-31 15 0.9
## 103588 2016-12-31 16 -1.7
## 103589 2016-12-31 17 -2.7
## 103590 2016-12-31 18 -3.3
## 103591 2016-12-31 19 -3.3
## 103592 2016-12-31 20 -3.9
## 103593 2016-12-31 21 -5.3
## 103594 2016-12-31 22 -6.2
## 103595 2016-12-31 23 -6.3
Possibility that not everyday hourly measurements were taken is present thereby I am omitting days with -999values and days which do not have full 24 hour measurements. Usually, measurements in those days were starting at 7am, 3 pm, for instance.
counts_agg = aggregate(data$hour, by=list(data$date), length)
indices = which(abs(counts_agg$x) != 24)
indices
## [1] 1186 1319 1346 1347 1579 2678 2679 2682 2683 3490
days_to_remove = counts_agg[indices, ]$Group.1
days_to_remove
## [1] "2008-05-29" "2008-10-09" "2008-11-05" "2008-11-11" "2009-07-01"
## [6] "2012-07-04" "2012-07-05" "2012-07-08" "2012-07-09" "2014-09-24"
indices_to_remove = which(data$date %in% days_to_remove)
data = data[-indices_to_remove, ]
head(data,n=26);tail(data,n=26)
## date hour tt_tu
## 1 2005-03-01 0 -17.1
## 2 2005-03-01 1 -16.9
## 3 2005-03-01 2 -17.7
## 4 2005-03-01 3 -17.1
## 5 2005-03-01 4 -18.5
## 6 2005-03-01 5 -19.8
## 7 2005-03-01 6 -19.5
## 8 2005-03-01 7 -16.8
## 9 2005-03-01 8 -15.5
## 10 2005-03-01 9 -12.3
## 11 2005-03-01 10 -8.9
## 12 2005-03-01 11 -7.2
## 13 2005-03-01 12 -6.6
## 14 2005-03-01 13 -5.5
## 15 2005-03-01 14 -4.5
## 16 2005-03-01 15 -2.3
## 17 2005-03-01 16 -6.3
## 18 2005-03-01 17 -9.5
## 19 2005-03-01 18 -12.9
## 20 2005-03-01 19 -13.4
## 21 2005-03-01 20 -15.1
## 22 2005-03-01 21 -15.6
## 23 2005-03-01 22 -16.1
## 24 2005-03-01 23 -18.4
## 25 2005-03-02 0 -19.0
## 26 2005-03-02 1 -18.5
## date hour tt_tu
## 103570 2016-12-30 22 -4.5
## 103571 2016-12-30 23 -5.5
## 103572 2016-12-31 0 -6.1
## 103573 2016-12-31 1 -6.8
## 103574 2016-12-31 2 -6.7
## 103575 2016-12-31 3 -6.9
## 103576 2016-12-31 4 -7.3
## 103577 2016-12-31 5 -7.6
## 103578 2016-12-31 6 -7.9
## 103579 2016-12-31 7 -7.7
## 103580 2016-12-31 8 -7.0
## 103581 2016-12-31 9 -4.6
## 103582 2016-12-31 10 -3.0
## 103583 2016-12-31 11 -0.9
## 103584 2016-12-31 12 1.8
## 103585 2016-12-31 13 2.3
## 103586 2016-12-31 14 3.0
## 103587 2016-12-31 15 0.9
## 103588 2016-12-31 16 -1.7
## 103589 2016-12-31 17 -2.7
## 103590 2016-12-31 18 -3.3
## 103591 2016-12-31 19 -3.3
## 103592 2016-12-31 20 -3.9
## 103593 2016-12-31 21 -5.3
## 103594 2016-12-31 22 -6.2
## 103595 2016-12-31 23 -6.3
indices= which(abs(data$tt_tu) > 100)
indices = which(data$date %in% data[indices,]$date)
data = data[-indices, ]
head(data,n=26);tail(data,n=26)
## date hour tt_tu
## 1 2005-03-01 0 -17.1
## 2 2005-03-01 1 -16.9
## 3 2005-03-01 2 -17.7
## 4 2005-03-01 3 -17.1
## 5 2005-03-01 4 -18.5
## 6 2005-03-01 5 -19.8
## 7 2005-03-01 6 -19.5
## 8 2005-03-01 7 -16.8
## 9 2005-03-01 8 -15.5
## 10 2005-03-01 9 -12.3
## 11 2005-03-01 10 -8.9
## 12 2005-03-01 11 -7.2
## 13 2005-03-01 12 -6.6
## 14 2005-03-01 13 -5.5
## 15 2005-03-01 14 -4.5
## 16 2005-03-01 15 -2.3
## 17 2005-03-01 16 -6.3
## 18 2005-03-01 17 -9.5
## 19 2005-03-01 18 -12.9
## 20 2005-03-01 19 -13.4
## 21 2005-03-01 20 -15.1
## 22 2005-03-01 21 -15.6
## 23 2005-03-01 22 -16.1
## 24 2005-03-01 23 -18.4
## 25 2005-03-02 0 -19.0
## 26 2005-03-02 1 -18.5
## date hour tt_tu
## 103570 2016-12-30 22 -4.5
## 103571 2016-12-30 23 -5.5
## 103572 2016-12-31 0 -6.1
## 103573 2016-12-31 1 -6.8
## 103574 2016-12-31 2 -6.7
## 103575 2016-12-31 3 -6.9
## 103576 2016-12-31 4 -7.3
## 103577 2016-12-31 5 -7.6
## 103578 2016-12-31 6 -7.9
## 103579 2016-12-31 7 -7.7
## 103580 2016-12-31 8 -7.0
## 103581 2016-12-31 9 -4.6
## 103582 2016-12-31 10 -3.0
## 103583 2016-12-31 11 -0.9
## 103584 2016-12-31 12 1.8
## 103585 2016-12-31 13 2.3
## 103586 2016-12-31 14 3.0
## 103587 2016-12-31 15 0.9
## 103588 2016-12-31 16 -1.7
## 103589 2016-12-31 17 -2.7
## 103590 2016-12-31 18 -3.3
## 103591 2016-12-31 19 -3.3
## 103592 2016-12-31 20 -3.9
## 103593 2016-12-31 21 -5.3
## 103594 2016-12-31 22 -6.2
## 103595 2016-12-31 23 -6.3
New functions were made for both methods in order to be flexible for other formulas/values to put instead of them. mean_1 labelled calculations are for the old methods with special formula. One value was added to the multpliers in that formula because in R, values are started counted from 0. Thereby, 7 am is shown as 8, 2 pm as 15, 9 pm as 22. Aggregation function is used for final calculations of mean values, where temperature values are selected by days. In order to track the difference between two methods and later built a graph, difference column is added to the table.
mean_1 <- function(values) {
output = mean(values)
return(output)
}
mean_2<- function(values) {
output = 0.25*(values[8] + values[15] + 2*values[22])
return(output)
}
data_agg = aggregate(data$tt_tu, by=list(data$date), mean_2)
data_agg["y"] = aggregate(data$tt_tu, by=list(data$date), mean_1)["x"]
data_agg["difference"] = data_agg["y"] - data_agg["x"]
colnames(data_agg)[colnames(data_agg)==c("Group.1","x","y")]<-c("Date","mean_2","mean_1")
## Warning in colnames(data_agg) == c("Group.1", "x", "y"): longer object
## length is not a multiple of shorter object length
head(data_agg,n=26);tail(data_agg,n=26)
## Date mean_2 mean_1 difference
## 1 2005-03-01 -13.125 -13.0625000 0.06250000
## 2 2005-03-02 -11.450 -11.3041667 0.14583333
## 3 2005-03-03 -4.275 -6.2625000 -1.98750000
## 4 2005-03-04 -5.550 -5.1708333 0.37916667
## 5 2005-03-05 -3.475 -4.0833333 -0.60833333
## 6 2005-03-06 -5.325 -5.0000000 0.32500000
## 7 2005-03-07 -4.675 -5.6500000 -0.97500000
## 8 2005-03-08 -0.400 -1.3375000 -0.93750000
## 9 2005-03-09 0.275 0.2750000 0.00000000
## 10 2005-03-10 -4.400 -2.8166667 1.58333333
## 11 2005-03-11 0.625 -2.5708333 -3.19583333
## 12 2005-03-12 1.325 2.1416667 0.81666667
## 13 2005-03-13 -0.900 -0.1666667 0.73333333
## 14 2005-03-14 0.275 -0.4250000 -0.70000000
## 15 2005-03-15 3.025 3.6541667 0.62916667
## 16 2005-03-16 4.150 5.2875000 1.13750000
## 17 2005-03-17 12.025 9.7875000 -2.23750000
## 18 2005-03-18 12.975 13.0708333 0.09583333
## 19 2005-03-19 7.275 8.7500000 1.47500000
## 20 2005-03-20 1.925 2.7583333 0.83333333
## 21 2005-03-21 3.925 2.9916667 -0.93333333
## 22 2005-03-22 5.800 5.5708333 -0.22916667
## 23 2005-03-23 10.825 9.8666667 -0.95833333
## 24 2005-03-24 8.650 8.9291667 0.27916667
## 25 2005-03-25 7.425 7.9291667 0.50416667
## 26 2005-03-26 7.175 7.2416667 0.06666667
## Date mean_2 mean_1 difference
## 4282 2016-12-06 -2.625 -2.66250000 -0.0375000
## 4283 2016-12-07 -3.725 -3.63750000 0.0875000
## 4284 2016-12-08 -0.750 -1.60833333 -0.8583333
## 4285 2016-12-09 0.325 0.07916667 -0.2458333
## 4286 2016-12-10 0.425 0.23750000 -0.1875000
## 4287 2016-12-11 3.275 2.19166667 -1.0833333
## 4288 2016-12-12 2.100 2.67500000 0.5750000
## 4289 2016-12-13 0.500 0.12500000 -0.3750000
## 4290 2016-12-14 3.400 3.07083333 -0.3291667
## 4291 2016-12-15 1.625 1.23750000 -0.3875000
## 4292 2016-12-16 -0.500 -0.35833333 0.1416667
## 4293 2016-12-17 -1.175 -1.39583333 -0.2208333
## 4294 2016-12-18 0.375 0.15416667 -0.2208333
## 4295 2016-12-19 -0.375 -0.10416667 0.2708333
## 4296 2016-12-20 -2.675 -2.52083333 0.1541667
## 4297 2016-12-21 -3.000 -2.95000000 0.0500000
## 4298 2016-12-22 -3.475 -3.26250000 0.2125000
## 4299 2016-12-23 0.025 -0.89166667 -0.9166667
## 4300 2016-12-24 4.025 3.27916667 -0.7458333
## 4301 2016-12-25 7.925 6.89583333 -1.0291667
## 4302 2016-12-26 7.550 7.66250000 0.1125000
## 4303 2016-12-27 2.750 2.62500000 -0.1250000
## 4304 2016-12-28 2.675 3.00000000 0.3250000
## 4305 2016-12-29 0.475 1.14583333 0.6708333
## 4306 2016-12-30 -2.950 -2.45416667 0.4958333
## 4307 2016-12-31 -3.825 -4.05000000 -0.2250000
Blue colored plots is for the first method and red is for the second method.As a result of a comparison of two methods on a last graphline it is visible that they agree well and the differences between them range from -4 to 3 degrees.
mean_1_plot<-plot(data_agg$Date, data_agg$mean_1, col = "blue", pch = 20)
mean_2_plot<-plot(data_agg$Date, data_agg$mean_2, col = "red", pch = 20)
diff_plot<-plot(data_agg$Date,data_agg$difference, col="black", type="l", lwd=0.5)