Data from https://en.wikipedia.org/wiki/Marathon_world_record_progression. Data in csv format, after cleaning of non-record values, is available from https://www.dropbox.com/s/qyeitjqvsjmo2gn/marathon_clean.csv?dl=0, or see Appendix below.
We assume the existence of a record lower bound that we denote by \(h_{inf}\). We do this by looking for the best curve of the form \(h(x) = h_{inf}+e^{b+ax}.\) This is done by writing the previous relation as \(\log(h(x) = h_{inf})=b+ax\), and estimating \(a\) and \(b\) by least square.
# first try with a limit value of 2 hours
h_inf = 2
linmodel=lm(formula=log(time-h_inf)~numdate,data=df )
#summary(linmodel)
a=linmodel$coefficients[2]
b=linmodel$coefficients[1]
resid_lin = linmodel$residuals
resid_data = time - (h_inf+exp(b+numdate*a))
qplot(numdate,resid_lin, main = 'Residuals in the linear model')
qplot(numdate,resid_data, main = 'Residuals in the original model')
SSE_lin = sum((resid_lin)^2)
SSE_data = sum((resid_data)^2)
The estimates are
a
## numdate
## -0.02422051
b
## (Intercept)
## 46.02103
SSE_lin
## [1] 0.596249
SSE_data
## [1] 0.09515596
Here is the estimated curve:
We test a few hypotheses on \(1.8\leq h_{inf}\leq 2.01\).
N=100 # number of values of h_inf tested
h_vect = seq(1.8,2.01,length=N)
SSE_vect_lin = rep(0,N)
SSE_vect_data = rep(0,N)
for(i in 1:N){
h_inf = h_vect[i]
linmodel=lm(formula=log(time-h_inf)~numdate,data=df )
#summary(linmodel)
a=linmodel$coefficients[2]
b=linmodel$coefficients[1]
resid_lin = linmodel$residuals
resid_data = time - (h_inf+exp(b+numdate*a))
SSE_vect_lin[i] = sum((resid_lin)^2)
SSE_vect_data[i] = sum((resid_data)^2)
}
By plotting the residuals in the original model scale, we see that there is an optimum for \(h_{inf}\).
qplot(h_vect,SSE_vect_data, main = 'Sum of square errors in terms of h_inf -- original model')
#qplot(h_vect,SSE_vect_lin, main = 'Sum of square errors -- linear model')
It is obtained for \(h_{inf}\) equal to:
h_opt = h_vect[which.min(SSE_vect_data)]
h_opt
## [1] 1.963333
We plot now the curve with the optimum \(h_{inf}\).
# first try with a limit value of 2 hours
h_inf = h_opt
linmodel=lm(formula=log(time-h_inf)~numdate,data=df )
#summary(linmodel)
a=linmodel$coefficients[2]
b=linmodel$coefficients[1]
resid_lin = linmodel$residuals
resid_data = time - (h_inf+exp(b+numdate*a))
qplot(numdate,resid_lin, main = 'Residuals in the linear model')
qplot(numdate,resid_data, main = 'Residuals in the original model')
SSE_lin = sum((resid_lin)^2)
SSE_data = sum((resid_data)^2)
The estimates are
a
## numdate
## -0.02043101
b
## (Intercept)
## 38.7836
SSE_lin
## [1] 0.3598324
SSE_data
## [1] 0.08251888
Here is the estimated curve:
df[,2:9]
## date time numdate Time Name
## 1 1908-07-24 2.921778 1908.650 2:55:18.4 Johnny Hayes
## 2 1909-01-01 2.879278 1909.091 2:52:45.4 Robert Fowler
## 3 1909-02-12 2.781333 1909.206 2:46:52.8 James Clark
## 4 1909-05-08 2.767944 1909.439 2:46:04.6 Albert Raines
## 5 1909-05-26 2.708611 1909.488 2:42:31.0 Henry Barrett
## 6 1909-08-31 2.676167 1909.753 2:40:34.2 Thure Johansson
## 7 1913-05-12 2.637833 1913.449 2:38:16.2 Harry Green
## 8 1913-05-31 2.601833 1913.501 2:36:06.6 Alexis Ahlgren
## 9 1920-08-22 2.543278 1920.729 2:32:35.8 Hannes Kolehmainen
## 10 1925-10-12 2.483833 1925.868 2:29:01.8 Albert Michelsen
## 11 1935-03-21 2.437222 1935.306 2:26:14 Son Kitei
## 12 1947-04-19 2.427500 1947.385 2:25:39 Suh Yun-bok
## 13 1952-06-14 2.345056 1952.540 2:20:42.2 Jim Peters
## 14 1953-06-13 2.311222 1953.537 2:18:40.4 Jim Peters
## 15 1953-10-04 2.309667 1953.846 2:18:34.8 Jim Peters
## 16 1954-06-26 2.294278 1954.572 2:17:39.4 Jim Peters
## 17 1958-08-24 2.254722 1958.734 2:15:17.0 Sergei Popov
## 18 1960-09-10 2.254500 1960.781 2:15:16.2 Abebe Bikila
## 19 1963-02-17 2.254389 1963.218 2:15:15.8 Toru Terasawa
## 20 1963-06-15 2.241111 1963.541 2:14:28 Leonard Edelen
## 21 1964-06-13 2.231944 1964.538 2:13:55 Basil Heatley
## 22 1964-10-21 2.203389 1964.894 2:12:12.2 Abebe Bikila
## 23 1965-06-12 2.200000 1965.534 2:12:00 Morio Shigematsu
## 24 1967-12-03 2.160111 1968.009 2:09:36.4 Derek Clayton
## 25 1969-05-30 2.142667 1969.499 2:08:33.6 Derek Clayton
## 26 1981-12-06 2.138333 1982.019 2:08:18 Robert De Castella
## 27 1984-10-21 2.134722 1984.894 2:08:05 Steve Jones
## 28 1985-04-20 2.120000 1985.389 2:07:12 Carlos Lopes
## 29 1988-04-17 2.113889 1988.382 2:06:50 Belayneh Dinsamo
## 30 1998-09-20 2.101389 1998.807 2:06:05 Ronaldo da Costa
## 31 1999-10-24 2.095000 1999.900 2:05:42 Khalid Khannouchi
## 32 2002-04-14 2.093889 2002.372 2:05:38 Khalid Khannouchi
## 33 2003-09-28 2.081944 2003.829 2:04:55 Paul Tergat
## 34 2007-09-30 2.073889 2007.834 2:04:26 Haile Gebrselassie
## 35 2008-09-28 2.066389 2008.831 2:03:59 Haile Gebrselassie
## 36 2011-09-25 2.060556 2011.820 2:03:38 Patrick Makau
## 37 2013-09-29 2.056389 2013.833 2:03:23 Wilson Kipsang
## 38 2014-09-28 2.049167 2014.829 2:02:57 Dennis Kimetto
## Nationality Date
## 1 United States July 24, 1908
## 2 United States January 1, 1909
## 3 United States February 12, 1909
## 4 United States May 8, 1909
## 5 United Kingdom May 26, 1909
## 6 Sweden August 31, 1909
## 7 United Kingdom May 12, 1913
## 8 Sweden May 31, 1913
## 9 Finland August 22, 1920
## 10 United States October 12, 1925
## 11 Empire of Japan[52] March 21, 1935
## 12 Korea April 19, 1947
## 13 United Kingdom June 14, 1952
## 14 United Kingdom June 13, 1953
## 15 United Kingdom October 4, 1953
## 16 United Kingdom June 26, 1954
## 17 Soviet Union August 24, 1958
## 18 Ethiopia September 10, 1960
## 19 Japan February 17, 1963
## 20 United States June 15, 1963
## 21 United Kingdom June 13, 1964
## 22 Ethiopia October 21, 1964
## 23 Japan June 12, 1965
## 24 Australia December 3, 1967
## 25 Australia May 30, 1969
## 26 Australia December 6, 1981
## 27 United Kingdom October 21, 1984
## 28 Portugal April 20, 1985
## 29 Ethiopia April 17, 1988
## 30 Brazil September 20, 1998
## 31 Morocco October 24, 1999
## 32 United States April 14, 2002
## 33 Kenya September 28, 2003
## 34 Ethiopia September 30, 2007
## 35 Ethiopia September 28, 2008
## 36 Kenya September 25, 2011
## 37 Kenya September 29, 2013
## 38 Kenya September 28, 2014
## Event.Place
## 1 London, United Kingdom
## 2 Yonkers,[nb 5] United States
## 3 New York City, United States
## 4 New York City, United States
## 5 Polytechnic Marathon,London, United Kingdom
## 6 Stockholm, Sweden
## 7 Polytechnic Marathon
## 8 Polytechnic Marathon
## 9 Antwerp, Belgium
## 10 Port Chester, United States
## 11 Tokyo, Japan
## 12 Boston Marathon
## 13 Polytechnic Marathon
## 14 Polytechnic Marathon
## 15 Turku Marathon
## 16 Polytechnic Marathon
## 17 Stockholm, Sweden
## 18 Rome, Italy
## 19 Beppu-Ōita Marathon
## 20 Polytechnic Marathon
## 21 Polytechnic Marathon
## 22 Tokyo, Japan
## 23 Polytechnic Marathon
## 24 Fukuoka Marathon
## 25 Antwerp, Belgium
## 26 Fukuoka Marathon
## 27 Chicago Marathon
## 28 Rotterdam Marathon
## 29 Rotterdam Marathon
## 30 Berlin Marathon
## 31 Chicago Marathon
## 32 London Marathon
## 33 Berlin Marathon
## 34 Berlin Marathon
## 35 Berlin Marathon
## 36 Berlin Marathon
## 37 Berlin Marathon
## 38 Berlin Marathon