1 Data

Data from https://en.wikipedia.org/wiki/Marathon_world_record_progression. Data in csv format, after cleaning of non-record values, is available from https://www.dropbox.com/s/qyeitjqvsjmo2gn/marathon_clean.csv?dl=0, or see Appendix below.

2 Modeling

We assume the existence of a record lower bound that we denote by \(h_{inf}\). We do this by looking for the best curve of the form \(h(x) = h_{inf}+e^{b+ax}.\) This is done by writing the previous relation as \(\log(h(x) = h_{inf})=b+ax\), and estimating \(a\) and \(b\) by least square.

# first try with a limit value of 2 hours
h_inf = 2 
linmodel=lm(formula=log(time-h_inf)~numdate,data=df )
#summary(linmodel)

a=linmodel$coefficients[2]
b=linmodel$coefficients[1]

resid_lin = linmodel$residuals
resid_data = time - (h_inf+exp(b+numdate*a))
qplot(numdate,resid_lin, main = 'Residuals in the linear model')

qplot(numdate,resid_data, main = 'Residuals in the original model')

SSE_lin = sum((resid_lin)^2)
SSE_data = sum((resid_data)^2)

The estimates are

a
##     numdate 
## -0.02422051
b
## (Intercept) 
##    46.02103
SSE_lin
## [1] 0.596249
SSE_data
## [1] 0.09515596

Here is the estimated curve:

3 Check of best value of \(h_{inf}\)

We test a few hypotheses on \(1.8\leq h_{inf}\leq 2.01\).

N=100 # number of values of h_inf tested
h_vect = seq(1.8,2.01,length=N)
SSE_vect_lin = rep(0,N)
SSE_vect_data = rep(0,N)
for(i in 1:N){
  h_inf = h_vect[i]
linmodel=lm(formula=log(time-h_inf)~numdate,data=df )
#summary(linmodel)

a=linmodel$coefficients[2]
b=linmodel$coefficients[1]

resid_lin = linmodel$residuals
resid_data = time - (h_inf+exp(b+numdate*a))
SSE_vect_lin[i] = sum((resid_lin)^2)
SSE_vect_data[i] = sum((resid_data)^2)
}

By plotting the residuals in the original model scale, we see that there is an optimum for \(h_{inf}\).

qplot(h_vect,SSE_vect_data, main = 'Sum of square errors in terms of h_inf -- original model')

#qplot(h_vect,SSE_vect_lin, main = 'Sum of square errors -- linear model')

It is obtained for \(h_{inf}\) equal to:

h_opt = h_vect[which.min(SSE_vect_data)]
h_opt
## [1] 1.963333

4 Model with the best fit

We plot now the curve with the optimum \(h_{inf}\).

# first try with a limit value of 2 hours
h_inf = h_opt
linmodel=lm(formula=log(time-h_inf)~numdate,data=df )
#summary(linmodel)

a=linmodel$coefficients[2]
b=linmodel$coefficients[1]

resid_lin = linmodel$residuals
resid_data = time - (h_inf+exp(b+numdate*a))
qplot(numdate,resid_lin, main = 'Residuals in the linear model')

qplot(numdate,resid_data, main = 'Residuals in the original model')

SSE_lin = sum((resid_lin)^2)
SSE_data = sum((resid_data)^2)

The estimates are

a
##     numdate 
## -0.02043101
b
## (Intercept) 
##     38.7836
SSE_lin
## [1] 0.3598324
SSE_data
## [1] 0.08251888

Here is the estimated curve:

5 Appendix: Data

df[,2:9]
##          date     time  numdate      Time               Name
## 1  1908-07-24 2.921778 1908.650 2:55:18.4       Johnny Hayes
## 2  1909-01-01 2.879278 1909.091 2:52:45.4      Robert Fowler
## 3  1909-02-12 2.781333 1909.206 2:46:52.8        James Clark
## 4  1909-05-08 2.767944 1909.439 2:46:04.6      Albert Raines
## 5  1909-05-26 2.708611 1909.488 2:42:31.0      Henry Barrett
## 6  1909-08-31 2.676167 1909.753 2:40:34.2    Thure Johansson
## 7  1913-05-12 2.637833 1913.449 2:38:16.2        Harry Green
## 8  1913-05-31 2.601833 1913.501 2:36:06.6     Alexis Ahlgren
## 9  1920-08-22 2.543278 1920.729 2:32:35.8 Hannes Kolehmainen
## 10 1925-10-12 2.483833 1925.868 2:29:01.8   Albert Michelsen
## 11 1935-03-21 2.437222 1935.306   2:26:14          Son Kitei
## 12 1947-04-19 2.427500 1947.385   2:25:39        Suh Yun-bok
## 13 1952-06-14 2.345056 1952.540 2:20:42.2         Jim Peters
## 14 1953-06-13 2.311222 1953.537 2:18:40.4         Jim Peters
## 15 1953-10-04 2.309667 1953.846 2:18:34.8         Jim Peters
## 16 1954-06-26 2.294278 1954.572 2:17:39.4         Jim Peters
## 17 1958-08-24 2.254722 1958.734 2:15:17.0       Sergei Popov
## 18 1960-09-10 2.254500 1960.781 2:15:16.2       Abebe Bikila
## 19 1963-02-17 2.254389 1963.218 2:15:15.8      Toru Terasawa
## 20 1963-06-15 2.241111 1963.541   2:14:28     Leonard Edelen
## 21 1964-06-13 2.231944 1964.538   2:13:55      Basil Heatley
## 22 1964-10-21 2.203389 1964.894 2:12:12.2       Abebe Bikila
## 23 1965-06-12 2.200000 1965.534   2:12:00   Morio Shigematsu
## 24 1967-12-03 2.160111 1968.009 2:09:36.4      Derek Clayton
## 25 1969-05-30 2.142667 1969.499 2:08:33.6      Derek Clayton
## 26 1981-12-06 2.138333 1982.019   2:08:18 Robert De Castella
## 27 1984-10-21 2.134722 1984.894   2:08:05        Steve Jones
## 28 1985-04-20 2.120000 1985.389   2:07:12       Carlos Lopes
## 29 1988-04-17 2.113889 1988.382   2:06:50   Belayneh Dinsamo
## 30 1998-09-20 2.101389 1998.807   2:06:05   Ronaldo da Costa
## 31 1999-10-24 2.095000 1999.900   2:05:42  Khalid Khannouchi
## 32 2002-04-14 2.093889 2002.372   2:05:38  Khalid Khannouchi
## 33 2003-09-28 2.081944 2003.829   2:04:55        Paul Tergat
## 34 2007-09-30 2.073889 2007.834   2:04:26 Haile Gebrselassie
## 35 2008-09-28 2.066389 2008.831   2:03:59 Haile Gebrselassie
## 36 2011-09-25 2.060556 2011.820   2:03:38      Patrick Makau
## 37 2013-09-29 2.056389 2013.833   2:03:23     Wilson Kipsang
## 38 2014-09-28 2.049167 2014.829   2:02:57     Dennis Kimetto
##             Nationality               Date
## 1         United States      July 24, 1908
## 2         United States    January 1, 1909
## 3         United States  February 12, 1909
## 4         United States        May 8, 1909
## 5        United Kingdom       May 26, 1909
## 6                Sweden    August 31, 1909
## 7        United Kingdom       May 12, 1913
## 8                Sweden       May 31, 1913
## 9               Finland    August 22, 1920
## 10        United States   October 12, 1925
## 11  Empire of Japan[52]     March 21, 1935
## 12                Korea     April 19, 1947
## 13       United Kingdom      June 14, 1952
## 14       United Kingdom      June 13, 1953
## 15       United Kingdom    October 4, 1953
## 16       United Kingdom      June 26, 1954
## 17         Soviet Union    August 24, 1958
## 18             Ethiopia September 10, 1960
## 19                Japan  February 17, 1963
## 20        United States      June 15, 1963
## 21       United Kingdom      June 13, 1964
## 22             Ethiopia   October 21, 1964
## 23                Japan      June 12, 1965
## 24            Australia   December 3, 1967
## 25            Australia       May 30, 1969
## 26            Australia   December 6, 1981
## 27       United Kingdom   October 21, 1984
## 28             Portugal     April 20, 1985
## 29             Ethiopia     April 17, 1988
## 30               Brazil September 20, 1998
## 31              Morocco   October 24, 1999
## 32        United States     April 14, 2002
## 33                Kenya September 28, 2003
## 34             Ethiopia September 30, 2007
## 35             Ethiopia September 28, 2008
## 36                Kenya September 25, 2011
## 37                Kenya September 29, 2013
## 38                Kenya September 28, 2014
##                                    Event.Place
## 1                       London, United Kingdom
## 2                 Yonkers,[nb 5] United States
## 3                 New York City, United States
## 4                 New York City, United States
## 5  Polytechnic Marathon,London, United Kingdom
## 6                            Stockholm, Sweden
## 7                         Polytechnic Marathon
## 8                         Polytechnic Marathon
## 9                             Antwerp, Belgium
## 10                 Port Chester, United States
## 11                                Tokyo, Japan
## 12                             Boston Marathon
## 13                        Polytechnic Marathon
## 14                        Polytechnic Marathon
## 15                              Turku Marathon
## 16                        Polytechnic Marathon
## 17                           Stockholm, Sweden
## 18                                 Rome, Italy
## 19                         Beppu-Ōita Marathon
## 20                        Polytechnic Marathon
## 21                        Polytechnic Marathon
## 22                                Tokyo, Japan
## 23                        Polytechnic Marathon
## 24                            Fukuoka Marathon
## 25                            Antwerp, Belgium
## 26                            Fukuoka Marathon
## 27                            Chicago Marathon
## 28                          Rotterdam Marathon
## 29                          Rotterdam Marathon
## 30                             Berlin Marathon
## 31                            Chicago Marathon
## 32                             London Marathon
## 33                             Berlin Marathon
## 34                             Berlin Marathon
## 35                             Berlin Marathon
## 36                             Berlin Marathon
## 37                             Berlin Marathon
## 38                             Berlin Marathon