Records are moments of particular interest in many areas such as weather, finance and sports. Simplistic models allow for extremely general results to be found but have unrealistic assumptions making those models unsuitable for most applications. Instead models such as the linear drift model are used to model the progression of records for example when analysing the winning times of the Sydney to Hobart yacht race.
Every year on Boxing day at noon hundreds of yachts line up in Sydney harbour before a cannon is fired and the annual Sydney to Hobart yacht race takes off with the goal of setting a new record for the fastest time to sail from Sydney to Hobart. Throughout its 80 year history there have been 14 new record times ever set, as you can see in the below plot, from the inaugural winner Rani taking 6 days, 14 hours and 22 minutes down to the current record holder Comanche who took a mere 1 day, 9 hours, 15 minutes and 24 seconds.
When analysing records in a more mathematical setting the simplest case is when a sequence is made up of independent and identically distributed (i.i.d.) random variables. This is a very convenient model allowing for an extremely broad range of results to be found such as the chance of the nth observation being a record being 1/n. From this the exact distributions of the record values and when they occur can be derived no matter the actual underlying distribution.
One of these interesting results is that the mean time until the second record occurs, the first non-trivial one, is infinite. This seems paradoxical when you consider that half the time this occurs at the second observation coming from the 1/n chance with n=2. This means that in some simulations the second record won’t occur for an extremely long time, what we call a heavy tailed distribution.
However, using the i.i.d. model is far too simplistic for essentially all practical applications. The assumption of each observation being identically distributed is the main problem implying that there is no overall trend throughout time unlike what is seen in real world data. Think of climate change causing average temperatures to increase meaning maximum temperature records are broken more frequently or in the Sydney to Hobart example the decreasing trend in winning race times from technological advancements.
Instead a more realistic model which I explored in my research is the linear drift model which imposes an overall linear trend onto the sequence of records, like fitting a regression line. Unfortunately, analysis of records within the linear drift model is far more involved and must be done on a case-by-case basis with no extremely general closed form expressions like in the i.i.d. case. However, I did find that for an underlying exponential distribution introducing a positive linear drift reduces the expected time to the second record to a finite value.
Angus Stewart
Macquarie University