Despite advances in hardware and software technology, and a fancy price tag, the optical wrist-based heart rate monitor on the new Garmin Fenix 5 did not perform well in objective tests in real-life situations compared to an older Garmin model using a chest strap heart rate monitor. The optical monitor on the Fenix provided erratic measurements, showed large deviations from the chest strap data, and in many circumstances did not cross-correlate well with the chest strap data. Despite my hopeful enthusiasm, the optical HR technology is not quite far enough along for me to justify spending such a high price on the Fenix 5.
I get device envy a lot. A lot. I’m not an elite athlete. I’m a (slow) casual jogger at best, but when a new shiny toy becomes available, it catches my eye. I posted on Saturday, June 18 in the daily Q&A thread in /r/Running asking for confirmation that I was crazy for ordering the new Fenix 5. I already have a perfectly functional Forerunner 230 with a perfectly functional chest strap that does just about everything I need or want it to do.
But two things.
I like wearing the 230 as an activity tracker during the day to make sure I get enough movement in and don’t sit like a sloth at my desk; however it’s not the most handsome watch in the world, and the Fenix 5 is pretty.
I am/was curious about the 24/7 heart rate monitoring feature on the Fenix 5. Not because I need it, but simply for curiosity’s sake. I’m somewhat of a data nerd, and like to track things, and seeing decreases in my resting heart rate would both give me some data-driven confirmation that training is working and hopefully motivate me to keep on working.
Note, like I said, that the 230 with the heart rate strap does pretty much all I need for training. I’ve had no problems with the GPS (except when running in cities between tall buildings), and the chest strap seems to give me pretty reliable and useful data for my training sessions.
The Fenix 5 would just be a prettier watch for daily use, and hopefully give me useful HR monitoring data 24/7 and without the chest strap. But it’s pretty danged expensive at $600/pop.
Also, I’ve had issues with wrist-based HR monitoring before. My very first smart watch a few years ago was a Fitbit Surge that I returned after about a month because the HR values were so horribly off. Sitting completely still it would jump around between 55 and 110bpm, and during exercise it would consistently be 30-40bpm off of my old Polar chest strap (pre-smart watch) that had served me well for about a decade. I attributed it to a combination of wrist-based optical HR technology just not being good enough yet and my see-through ginger skin not being optimal for optical HR monitoring.
/u/skragen was the first respond and reminded me that, as long as my 230 is still doing well by me, that I could get my next five pairs of shoes for the price of the Fenix 5. This is obviously the smartest, most logical response. But the conversation ensued that perhaps wrist-based optical HR monitoring had improved since my Surge days, and that the Fenix might do better. Also, it dawned on me that see-through ginger skin should be better for optical monitoring than normal people’s skin, since it is… well… see through.
Note that I had done all of the normal things to make sure the Surge monitor was reading correctly: e.g., trying it in different places on my wrist, on both arms, making sure it’s tight enough, making sure the sensors were clean, etc. It just gave me really crappy readings.
But, before ordering the Fenix 5, I read DC Rainmaker’s review of the Fenix 5, which was overall very positive. Importantly for me, his tests of the HR monitoring function during exercise showed that it was pretty darned good during running, or at least lined-up well (in most circumstances for running) with the data provided by a Fenix 3 paired with a chest strap HRM and a Suunto Spartan Ultra paired to a Scosche Rhythm+.
However, like I mentioned above, I had a horrible time getting reliable readings from my Surge and wanted to verify that the Fenix 5 would work well for me, just like it worked for DC Rainmaker, given my ginger skin issues. Also, DC Rainmaker tested it under exercise conditions, not under normal daily conditions. This was of interest to me, since, as I mentioned above, the 24/7 monitoring was one of the main reasons I ordered it. DC Rainmaker notes “When it came to 24×7 mode, the new data looks much better, and the accuracy seems spot on for casual activities like watching TV, walking, or just living life.” But there were no graphs, and I like graphs.
Although after my back and forth with /u/skragen I was dead set on just returning the Fenix 5 to Amazon unopened, it showed up earlier than I anticipated (yesterday, June 18, 2017) so I figured I’d put it head-to-head with my 230-plus-HR strap under a variety of conditions, including just doing normal things around the house, kicking it in front of the TV, and during a basic workout on the elliptical trainer.
So, below is a report of what I found. I hope it’s helpful to some of you out there.
First, I want to put out there that the goals of what I report below are fairly limited. My goal is only to do a head-to-head comparison of my 230-plus-chest strap HR monitor with the optical wrist monitor on the Fenix 5 under some realistic conditions. I’m not looking at GPS accuracy, pace tracking, or any peripherals. This is also not a full review, and I’m not going to talk about the bells and whistles (the Fenix obviously wins here, as it does in attractiveness), though I’ll make some comments below about general functioning as relevant.
Once my package arrived from Amazon, I unpacked and got the Fenix set-up and started it charging. Using the Garmin Express app on my laptop, I made sure all of the software/firmware was up to date, and imported my user settings (height, weight, resting heart rate, max heart rate, etc). In this way, everything that the watch knows about me should be identical to what my 230 knows about me. I also shined up the sensors on the back, and then strapped it on.
I usually wear my watches on my left wrist, but I put this one on my right wrist, because I have a small mole on my left wrist in the vicinity of where the sensors would sit. So to avoid any possible contamination from that, the Fenix went on my right wrist and my 230 stayed on my left wrist where it normally lives.
For the chest strap, I wet the contacts prior to each recording, so the contact should be quite good, at least for the first 15-20min of each recording, though there might be some drying toward the ends of the recordings (except for Recording 3, where I was sweaty).
I know positioning is a big deal for optical HR monitoring, so I made sure I re-read Garmin’s page on wearing the device properly and avoiding erratic HR readings.
The watch got put on about 3" above my right wrist bone. Here is a picture of where it’s resided for the last two days.
That’s high enough that it shouldn’t get interference from my wrist bone, but low enough that it still looks like I’m wearing a watch. You can also see that: 1) I don’t have hairy arms, so hair interference shouldn’t be an issue, and 2) my skin is indeed gingery see-through.
Also, the silicone band that came with the watch has some stretch to it, so that I could make sure the fit was snug enough not to jiggle around, even with motion. I did several jiggle tests, and no jiggling noted.
Because I was only interest in HR data, I used the “Bike Indoor” activity on both watches – this will record HR data, but won’t bog down the watch with recording GPS data, pace, or other things I wasn’t interested in.
I did five total recordings:
Figures referring to the respective recordings will use those numbers, so cross-reference back up here.
So, recordings 1, 2, 4, and 5 are basically “normal everyday circumstances” which should be relevant to 24/7 monitoring. Note also that these activities involved far less arm motion than my elliptical workout. During the workout I didn’t hold on to the handles except a few times to cross-check my heart rates from the two devices with the metal grip heart rate monitors on the elliptical machine (so my wrists were stabilized only for a few total seconds of the 30min workout). Otherwise, during walking there was only a gentle arm swing, during hanging out on the couch there was some typing, and during housework there was maybe a little more movement and twisting, but not nearly as much movement as during the workout.
Why did I do a workout on the elliptical and not a “real” workout like running?
Good question. The short answer is that yesterday was my first workout in months. I had about 2 months of rest after an IT band injury in a half marathon in January, and then I had a months-long illness that kept me from exercising. I’m finally better and just getting back into conditioning. I’ve been doing a lot of walking lately, and my calves are still sore from that (it’s amazing how quickly you atrophy after no running), so my joints aren’t ready for a run yet. I’m taking baby steps in getting back into my workout regimen so that I don’t get injured again.
That said, the elliptical machine allowed me to have arm movements similar to a regular jogging workout. So the results should translate well.
After each recording, I synced both devices with the Garmin Connect app on my iPhone. I then downloaded the data files from the Garmin Connect website in .TCX format, which is just an XML document. Just to get a quick parse of the XML files into easily readable time series, I used this website to convert to the TCX files to CSV, and did some further formatting in Excel, and finally did the data analysis (and this write-up) in Rstudio.
Here are plots of the time series data from the 5 recordings:
The first things to notice are that the two devices tracked each other fairly well in Recordings 3 and 4. Recording 3 was my elliptical workout, which featured the most arm swing, so should have been the hardest for the devices. Recording 4 was me being a couch potato with almost no motion but typing. Nonetheless, there are a few big spikes in the Fenix numbers during Recording 4, despite there being minimal motion on my part.
The second thing you might see is that there is some missing data in the Forerunner recordings. The strap cut out a few times in each of the recordings, which led to no HR recording for some of the samples.
Recording 1 starts out all over the place (doing housework), but then things get a little better toward the end (going for a stroll around the neighborhood on a relatively humid, sweaty day).
In Recording 2, the first part is pretty good, but then things go bonkers. The first part involved me being a couch potato, but the second part involved movement. You’ll see a chunk of missing data for the Forerunner there. That was me getting changed for the gym, and I took the strap off for a minute (and re-wet it). But things don’t improve after that. The next portion is me walking around the house getting ready to go and gathering up some things. You’ll see my Forerunner HR hangs around 100, which is typical for me doing moderate activities around the house, it seems, but the Fenix numbers are way high. Then there’s a big drop in the Forerunner number – that coincides with me getting in the car and driving to the gym; but the Fenix number stays high for a while before it finally calms down. The end of the recording shows a pick-up in HR, where I’m walking from my car to the gym, but the Fenix way over-shoots in that case. Note that there was no crazy arm movement there – just me walking normally. And I say that the Fenix overshoots (and not that the Forerunner undershoots) because the Forerunner numbers track really well with my subjective impression of effort – I was walking moderately on pretty flat land (central Illinois here).
Recording 5 starts with me walking moderately the half mile to the coffee shop (where the Forerunner HR hangs just below to around 100 again, consistent with the other walking data), but the Fenix is again way high. Again, no crazy arm motions here – just walking. I noticed the numbers were weird there, so I checked to make sure the Fenix hadn’t slipped – it hadn’t. It was still snug in place. Then you can see my HR numbers come down a little as I sit down and start working on this document. Both devices showed the numbers coming down, but they really didn’t track each other very well, at least based on visual inspection of the data.
But visual inspection is only one thing. Let’s quantify somethings.
To look at how well the two devices track each other, I’m going to look at two metrics. The first is the root mean square (RMS) of the sample deviance. RMS is essentially the standard deviation across the samples. The RMS values are in the table below:
Recording | RMS (bpm) |
---|---|
1 | 9.85 |
2 | 15.20 |
3 | 6.15 |
4 | 4.00 |
5 | 10.20 |
What this means is that the average difference between the two devices across samples was about 9.9bpm in Recording 1, 15bpm in Recording 2, etc. So this metric confirms the suspicion we had above that Recordings 3 and 4 were the best, but the others didn’t fare so well. I’d consider a sampling difference of 4-5bpm to be within justifiable error. But a difference of 10bpm to me says something isn’t quite right. For people who are sticking to a strong HR-based training regimen (e.g., MAF), this amount of error is unacceptable. I don’t do MAF training, but if I’m going to spend $600-700 on a device, I want good data that will let me keep with my chosen training plan. You’ll see that many pretty benign circumstances (e.g., just walking gently) led to reading differences of 20-30bpm. I don’t consider that acceptable.
The second way to quantify the differences are to use cross-correlations for the two time series to check for correspondence at different lags. Cross-correlation plots for the 5 time-series are below (samples with missing Forerunner data are omitted):
First, note that the magnitude of the correlations between the data recorded by the two devices varies greatly across the different recordings: the strongest correlations were found in Recording 3, which matches with what we see in the time series plots. The others don’t fare so well.
Two other points to note in the cross-correlation plots:
The distributions of the cross-correlations differ greatly across recording. Recordings 1, 3, and 4 have their best correlations close to lag-0, and show a clear peak in the distributions. The means that the two devices tracked together in time (though how well they tracked is indicated by the magnitude of the r-value, which varied across the recordings); in Recordings 2 and 5, the distributions were relatively flat. This is pretty important. It means that one device (say, the Forerunner) could look ahead or behind about 25 samples from the other (say, the Fenix) and it would provide almost as good of a predictor of the present sample on the first device (that is, the Forerunner). This tells me that there is a lot of noise in the data, since you can essentially pick any value +/-25 samples from the Fenix and it provides almost as good of a predictor of the current reading of the Forerunner. In my mind, you’d want your devices to follow along with each other. That isn’t happening consistently here.
The lag position of the peak correlation value is sometimes positive and sometimes negative. This means that one device doesn’t consistently lag behind the other; it’s really variable. With a $600-700 device, I don’t want variability; I want accuracy.
Finally, a quick look at the training metrics that Garmin Connect provided for each of the recordings. Garmin estimates the number of calories you burned from a combination of your heart rate and your physical input. Remember that both devices have the exact same information about me, so they should line up well if the devices are tracking the same information.
Calories per recording
Recording | Forerunner | Fenix |
---|---|---|
1 | 113 | 222 |
2 | 76 | 167 |
3 | 258 | 384 |
4 | 107 | 208 |
5 | 105 | 199 |
The Fenix consistently estimated at least a >100 calorie burn advantage compared to the Forerunner for every recording – even the couch potato sessions. Something seems fishy there.
Although the Forerunner+chest strap and Fenix 5 optical sensor did a fairly good job of tracking each other during my workout (Recording 3), they did pretty poorly at other times.
The workout tracking fell within what I would call an acceptable margin of error. The biggest problems were at the beginning of the recording, which aligns well with what DC Rainmaker found in his review.
But for other daily activities, the tracking didn’t meet a standard that I would consider acceptable. The readings from the two devices were on average >= 10bpm apart at any given time, and they didn’t track together well.
Probably neither is 100% right. The only way to get a completely accurate recording is with a medical-grade ECG monitor. We’re working with consumer-level tools here. But my subjective feeling is that the chest strap was more accurate. I say this because its readings were generally less erratic, and lined up with my perceived level of exertion. E.g., when I was taking a stroll, the chest strap registered 90-105bpm, which is about right, whereas the Fenix sometimes registered 110-125bpm, and not just for a sample or two, but for several minutes at a time (e.g., in Recording 2 during driving – don’t worry, I’m a boring driver – and in Recording 5 during my walk to the coffee shop).
Note that I don’t think the differences are due to my strap drying out (which might cause it to behave improperly or stop recording). Sometimes the best tracking was at the start of a recording, when the strap had been freshly moistened, and sometimes it was at the end, when the strap presumably should be dryer.
Finally, during Recording 3 (the workout), when I noticed big differences between the two devices, I used the metal hand-held sensors on the elliptical machine to corroborate one of the devices. This always supported the reading from the Forerunner 230, not the Fenix.
No. Given the weirdness in measuring my heart rate during normal daily tasks with the Fenix, it’s not worth the money. My 230+chest strap works just fine for my purposes, and seems to be more accurate than the optical monitor on the Fenix 5. Note that you can get a chest strap to pair with the Fenix, but that seems to defeat the purpose of having the HR monitor built in to the watch, and doesn’t get around the other issues with poorly measuring HR during daily tasks (even when there’s not a lot of arm motion happening).
That said, the Forerunner chest strap signal did drop a few times in each recording, so there was some missing data. The Fenix consistently recorded data, though it was spikey and was plain inaccurate at times. But note that the chest strap never dropped for more than a couple of seconds, and sometimes for just 1 or 2 samples. I’ll take a few missing data points over innacurate and erratic readings.
If you have $600 burning a hole in your pocket and aren’t concerned with how accurate the 24/7 HR monitoring is, sure, why not? But I’d suggest getting the additional chest strap to measure HR during activities, both because it’s more accurate than the optical monitor and because the new-generation straps from Garmin collect a lot more running-related data than my older HR-only strap. The Fenix has a lot of neat data features for you to tinker with that the chest strap (or other pair-able peripherals) can provide you, which aren’t available on the 230.
Note that you could have much better luck with the optical monitor than I have. This one seemed to do better than the Surge from a few years ago for me, but still not as well as it seemed to do for DC Rainmaker. It could be differences in the way we tested the products, or it could just be something about me that doesn’t work well with optical HR monitor – see-through ginger skin behaves strangely with all sorts of things. So even though you’d think that see-through skin would allow the optical monitor to track what’s going on underneath, maybe it just doesn’t work like that. So, you might get much more accurate readings than I did.
But, for my purposes the 230 works just fine, and hopefully will get me through a few more years. Optical monitors are always improving, so hopefully the next generation will be good enough to lure me in.