Abstract

This paper seeks to understand the Head of the Charles primarily as it pertains to collegiate rowing. This paper evaluates the relative competitiveness of each event at the Head of the Charles and takes a detailed look at how the 4s and 8s rowed the course and how the course itself effected the outcome of the race. Overall, it appears that boats that started fastest out of the gate did better even though they dropped in the distribution later in the race.

Introduction

The Head of the Charles started in 1964 and today is the largest regatta in the world. The regatta draws almost 11,000 rowers from 25 countries. However, the United States dominates the delegation of rowers, with Canada and Great Britain in distant second and third, respectively.

Within the United States, most rowers are from the North East and or Eastern Seaboard.

Only 36 of 50 states have rowers participating in the race.

The Head of the Charles is a 4800m “head” race in Cambridge, MA. It is a head race because the boats are staggered when they start by their seed and places are determined by times. The course itself is on the Charles River, which has several turns, bridges, and bottlenecks that make the course itself a challenge to navigate.

Before we go any farther, there are a few rowing concepts that need to be clarified. First is the coxswain. In larger boats, 4+s and 8+s, the coxswain is responsible for steering the boat and motivating the rowers. He can either be in the stern (typical in the 8+) or the bow (typical in the 4+). There are also boats that do not have a coxswain, these are often sculling boats (quads, doubles, and singles). In these boats, a rower is responsible for steering and rowing. Second is the concept of the 500 m average split. In order to standardize the concept of speed in crew, it is convention to talk about 500 meter splits. When described as an average for the whole piece (race), an average 500m split says the average time it takes to row 500m. The output looks like “1:50”. In words, it took one minute and 50 second to row 500m on average. This unit of measure has been standardized by use on the erg.

Research Questions

The Head of the Charles can be an overwhelmingly complicated and competitive event. It is important to place well, usually defined as finishing in the top half of one’s event (above the median), in order to guarantee an entry in next year’s race. Teams that fail to meet this benchmark lose their bid in the regatta and must reenter the lottery process to receive a bid to race. Schools that fail to maintain their spots in the race have no guarantee of being able to return the previous year. Therefore, it is important for coaches, who have a finite number of rowers, to enter them in events that they can be competitive in to maintain their bids. Therefore, I chose to evaluate which races were the most competitive, so as to better inform coaches how to enter their boats.

Once I determined which races were most competitive, I focused on determining how a coach should go about making sure that his/her boats are prepared to row a good race. Therefore, I decided to look at in what ways the race course itself effected how the coxswain should steer the race and how the rowers should row the race.

Data

The data used in this paper primarily originated from Regatta Master, which is an online result reporting site used by the Head of the Charles. The website has a function which exports csv files. I used the master results by entry data set, which had information on official times and penalties for all 61 events. However, in order to get detailed results including the four split times, it was necessary to pull each race separately. Therefore, I focused my main split analysis on only 4s and 8s in the club, collegiate, championship, and lightweight categories. I also pulled the event summary sheet, which outlined the details for each race, and the list of clubs which gave the location of each participating club in the regatta. [See Appendix I for more details on the condition of the csv files and how they were read into R.]

The data used to produce the participation maps came from several sources. The data on postal codes was from infoplease.com and the table was directly copied from the website into excel and saved as a csv. The latitude and longitude data came from two sources. Most points came from simplemaps.com, however about 60 points needed to be input by hand from latlong.net directly into the csv file.

The shape files used to make the race course polygon was made entirely using google maps. Using a printout of the race course and a list of where the umpires were, I input each latitude and longitude point into excel by hand from google maps. I used between four and ten points per penalty zone and ordered them accordingly, by group, to make a shape data frame. Then I simply added the Weld Boathouse line to the shapefile and reordered the point to make the split polygon data frame.

Methodology

The main metric that I use to determine competitiveness is the coefficient of variation (CV), which is equal to the standard deviation over the mean.

\[CV = \frac{\sigma}{\mu}\]

Although the standard deviation alone can provide a good measure of clustering around the mean, it fails to be comparable between samples with different means. Therefore, in order to compare all the different types of boats and crews it is necessary to “standardize” the standard deviation by dividing it by the sample mean. This allows for a better comparison of relative clutter or dispersion between events. Moreover, the median and average can be computed using the coefficients of variation for all the events in order to make generalizations about the race course and regatta as a whole

In order to find out which section of the course had the largest effect on the race, I used the z-score of each boat’s finish within the event that it was entered. The z-score is a standardized distribution used to make comparisons of observations within a given sample. The score factors in the observation’s deviation from the mean as well as the standard deviation of the entire distribution.

\[Z = \frac{x - \mu}{\sigma}\]

In order to evaluate the importance of both negative splitting (rowing each section of the course faster than the last), I computed each boat’s overall z-score and their z-score for each of the four splits, within their event. So, the average boat in each event will have a z-score of zero. I then calculated four dummy variables, one for each split of the course, which are equal to 1 if that section was the given boats fastest piece relative to the other boats in its race. In other words, each boat should have one of the four dummies equal to 1 which corresponds with its segment of the race with the highest z-score. Finally I ran several OLS regressions using the boat’s total (overall) z-score as the independent variable and various combinations of the dummy variables (not all of them so as to avoid issues of multi-collinearity) as dependent variables. I also controlled for the boat’s time in each split as well as the gender of the boat and whether or not it received a penalty. This allowed me to isolate as much as possible the effect of rowing each of the four splits as the fastest. I hypothesized that rowing the third split, which encompasses the large starboard turn, to have the most significant effect on how the boat finished.

Results

What events are the most competitive?

Overall, the events at the Charles range greatly in their competitiveness from the most competitive, Men’s Championship 8+ (cv of .0264), to the least competitive, Mixed Adaptive Legs/Trunk/Arms Fours (cv = .2318). This inherently makes sense because abilities in adaptive rowing vary much more than the world class athletes who can participate in the Championship category (this includes the Ivy League Crews, the University of Washington, UC Berkeley, and the US National Team). It is important to note, however, that competitiveness is not determined by how fast the race was overall, but rather how close together were the boats within that event. If I used a measure that compared raw times, I would be comparing apples and oranges because the times vary by boat type, gender, and age; it wouldn’t be fair to compare masters rowers, Ivy League rowers, and youth rower by raw times alone.

Event.Id Event.Description perc.50 avg stnd.dev cv m.per.sec med.split med.500.split
48 Men’s Championship Eights 895.1970 895.6346 23.6827 0.0264 5.3619 93.2497 1:33.2
46 Men’s Championship Fours 996.1815 1001.8821 26.8831 0.0268 4.8184 103.7689 1:43.8
52 Men’s Lightweight Fours 1068.6260 1069.4185 31.9371 0.0299 4.4917 111.3152 1:51.3
54 Men’s Lightweight Eights 925.7660 932.8354 31.2815 0.0335 5.1849 96.4340 1:36.4
28 Men’s Championship Singles 1144.2315 1150.7084 39.5977 0.0344 4.1950 119.1908 1:59.2
29 Women’s Championship Singles 1252.1060 1262.1984 44.9233 0.0356 3.8335 130.4277 2:10.4

However, that is not to say that the median split is not important. If a coach is deciding between entering a 4 and an 8, he/she needs to be able to judge which boat has the best chance to get above the median boat (in the top half) so as to maintain the bid. The spread of variance in the median 500 meter splits for each race don’t seems as drastic as the coefficients of variation. However, the median boat in the Championship 8+ pulled a 500 meter split of 1:33 over a 4800m race, which for reference is faster than many collegiate rowers pull a 2000m race on the erg. In fairness, erg scores and water times are loosely comparable at best, but the comparison illuminates just how fast those boats can row. There are significant differences between 4s and 8s as far as boat speed, but knowing the median boat’s split in each event should give coaches a rough estimate of where their crews need to be in order to place well.

Competitiveness for Men’s events

Breaking down the results by sex and factoring in which boats are coxed, and which are not, gives a more complete picture of the choice coaches face because excluding a few mixed events women and men compete separately. It is interesting to note that four of the five most competitive men’s event are in coxed boats. Moreover, it seems to be the case that 4s and 8s for men are generally more competitive than skulled (uncoxed) events like the quad or singles.

Competitiveness for Women’s events

For Women, this doesn’t seem to be the case. Three of the five most competitive events for women are actually uncoxed events. This seems a little counter intuitive because, at least in collegiate rowing in the US, a large emphasis is placed on rowing 4s and 8s. Moreover, it is often unsafe to row smaller boats in the North East because they are more likely to flip which is dangerous in the cold.

How should the course be navigated?

I decided to use penalties to analyze the race course itself because in coxed boats they are rarely unforced (especially because I have thrown out the youth divisions in my detailed analysis).

This plot clearly shows that coxed boats are much less likely to commit a penalty.

The Course

The course itself starts at the Boston University (BU) boathouse and continues up river to the park past the Belmont Hill Boathouse. Along the way there are several turns, and the course varies in width as well.

[You may need to zoom out]

In the plot below, I plotted the different areas that correspond to the various penalty codes reported in the detailed race results.

There are three types of penalties that show up in my analysis: interference or failure to yield (IN), buoy violations (G, R, W), and arch violations (A). I am assuming that the coxswains did their due diligence and studied the rules ahead of time, and therefore none of these errors were unforced. This shouldn’t be a big stretch because I am throwing out the youth coxswains and the masters coxswains and only focusing on the most competitive races in 4s and 8s.

Where do Penalties Happen?

Interestingly, I found that most of the penalties occurred around Anderson Bridge, which is just before the “big turn” and following the “power house” straightaway. This is split includes the Weld Boathouse which is the split point between the second and third splits (the Weld Split and the Cambridge Split). The penalties could be centered here for a few reasons. First, because the boats are staggered at the start the boats that can pass need time to gain ground on the boats ahead of them. Second, this is in the middle of two turns, one to port and the other to starboard, which creates a corkscrew. Therefore, coxswain may be trying to minimize the distance rowed and therefore cut across more of the boats to gain position for the “big turn” to starboard. Finally, the “big turn” is infamous because it is very narrow and so long. Therefore getting the inside line on the buoys is very important. This could be causing coxswains to take aggressive lines to set up a good turn.

How should the course be rowed?

Overall, there wasn’t as much variation between the coefficients of variation when broken up by section of the race. After taking the median and average of the coefficients of Variation for each race, the median coefficient of variation was lowest at the beginning of the race and highest in the third split (Cambridge). There were varying degrees of rightward skewness in the distribution of all the splits (the mean was greater than the median). However, in most races, there was greater variation in the third split which encompasses the “big turn”. This supports the results we found in the penalty data, because boats are shuffling for a good line around Anderson Bridge which leads to greater variation in the Cambridge split and more penalties around Anderson.

split avg.cv med.cv group
s1.cv 0.0488 0.0432 1
s2.cv 0.0484 0.0448 2
s3.cv 0.0487 0.0469 3
s4.cv 0.0472 0.0451 4

Even so, the question of how best to row the course has not completely been answered. Do boats negative split the course like rowers should on the erg? Is performance in one portion of the race determinant of performance overall?

Z-score Analysis

In order to investigate this relationship I used a simple OLS multiple regression. I decided to control for boat speed, penalties, and sex in order to look at which section of the course had the most influence on the boats relative position in the race as shown by its z-score. I found that boats that rowed their best split to start (the z-score of their start was higher than that of their other three splits) did better (Adjusted R-squared = .5135). The F-statistic for the model was significant at a 1% level. Interestingly, the coefficient for boats who rowed their third split as their fastest split was insignificant. Overall, it appears that those boats that were able to do relatively better in the beginning of the race, from the start line to Riverside Boathouse, did better than boats that had their better sections towards the end.

Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.653 0.561 -17.201 0.000
s1.total.sec -0.043 0.007 -6.222 0.000
s2.total.sec 0.038 0.006 6.476 0.000
s3.total.sec 0.009 0.011 0.786 0.432
s4.total.sec 0.033 0.013 2.493 0.013
s2.fastest -0.538 0.123 -4.355 0.000
s3.fastest -0.291 0.139 -2.096 0.037
s4.fastest -0.390 0.135 -2.876 0.004
female -1.389 0.124 -11.188 0.000
pen.y.n 0.718 0.193 3.726 0.000

This is indicative of just how important a good line is for the Charles. Boats that are able to get a good line, even though they might be out pulled later on in the race, are able to finish better than those boats that started off slower. An interesting nuance of rowing could also play into this result. The Charles is a head race, and therefore in order to see another boat you need to either catch one or be caught. Seeing another boat, especially passing one, is a huge motivator for a rower, so it makes sense that boats that were able to row better in the beginning, alone, where able to catch boats and were motivated to row faster. Moreover, as boats were being passed, they also had increased incentive to row faster because they know just how embarrassing being passed is. Moreover, if a boat can row faster in the beginning, they are set up for a better line at the Anderson Bridge and therefor on the “big turn”. So, to answer the original question, positioning matter a lot in the Charles and therefore it could be wise to try to get a strong start while motivation is lower and trust that a crew won’t “fly and die” like the University of Delaware Men’s Collegiate 4 (see appendix).

avg.1.fast avg.2.fast avg.3.fast avg.4.fast
0.324 0.235 0.187 0.254

In the above table, we can see that 32% of boats rowed the first split the fastes, whereas only 18.7% of boats rowed the “big turn” (Split 3) as their fastest split relative to other boats.

Discussion

There are a few important details to note about the results of this analysis. First and foremost, weather was implicitly assumed to be constant for each race, which is not necessarily true. Each race to around sixteen to twenty minutes and the boats were starting at different times. Therefore the winds could have shifted during the race and effected different crews in different ways. Rowing in a headwind can help set up heavier boats in the water, but hurt lighter crews. Whereas tailwinds can help lighter crews more than heavier crews. Cross winds tend to help nobody. Over the Three days of the Charles, the wind, temperature, and current were all in flux and it is a very inaccurate science to analyze how the conditions affected each race. Moreover, there is no way to know what the conditions will be like in any given year. Therefore, it would be good to repeat the regression including data on previous year in order to have time fixed effects.

In addition to weather data, it would be interesting to look at how individuals’ ages affected their performance at the Charles, however the data on ages was incomplete. Rowing experience, erg scores, weight, height, and a number of other physical characteristics would be interesting to look at in order to see what types of athletes perform best in the race. Unfortunately, this information has not been collected although a rowers survey could be an interesting addition to the Charles in the future.

Conclusion

Overall, this paper shows that a large amount of thought and preparation, on the part of the coach, needs to happen in order to have a successful race. Entering your rowers in competitive boats is step one. Then knowing how to instruct your rowers and coxswain to tackle the race itself is incredibly important. It is apparent that boats that are able to take advantage of the power house straightaway in the first split of the race are able to get better lines and positions that allow them to finish higher in the distribution. Moreover, passing for position around Anderson Bridge can be seen in the concentration of penalties at that point in the course. All in all, the course has a dramatic effect on how the race is best rowed and it is not necessarily best to negative split if it means giving up the best line.

Acknowledgments

Prof. Albert Kim, it is impossible to cite the amount of code used from lectures and office hours. Coach Adam Askham for assisting with penalty interpretations and developing research questions. Brendan Mulvey, The Head of the Charles Regatta, for providing details on penalty codes and umpire location.

Appendix I: Notes on Data

The csv files pulled from Regatta Master had numerous formatting irregularities that made them impossible to directly read into R. Several extra rows and columns were added in order to make the sheet easy to look at, but not easy to input into R. Therefore, I used the basic steps (outlined below) to alter each sheet. A single VBA macro could not be written because each sheet was formatted differently.

  1. Clear all formats
  2. Copy event number
  3. Paste event number in every cell of the first column (already empty)
  4. Delete all event information in the top rows
  5. Paste standardized list of new column names to make selecting them in R more uniform (Leaving the extra columns because a simple select() command will sort them out later)
  6. Read into R as a csv

Apenddix II : How did Midd Do?

In the 51st Head of the Charles, Middlebury College entered two men’s collegiate fours and one women’s collegiate 8.

The above interactive histogram shows the distribution of official times for the Men’s Collegiate 4s. Middlebury’s first boat finished in 18th and retained its bid by finishing above the median boat. The second men’s four finished 34th and lost its bid for the 52nd HOCR. The black line represents the median boat, and the blue lines show the two Middlebury 4s.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The above interactive wrap of distributions shows how the distribution changed throughout the race. Interestingly, the second Middlebury 4 did very well on “the big” turn most likely because they got a favorable line.

Interestingly, Middlebury’s top boat had a very good 2nd split (Weld).

The Women finished at the median, but retained their bid for the 52nd Charles.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.