Confirmation Bias and Extraordinary Claims

I was inspired recently by a YouTube video I watched from the Royal Institution featuring mathematician, Matt Parker, talking about “What Happens When Maths Goes Wrong?”. In the video he talks about a news story he read which claimed that mapping ancient monolithic sites in the UK created very precise isosceles triangles which were “so precise they could not have happened by accident, in fact they couldn’t rule out extraterrestrial help”. The researchers speculated that it was a sort of ancient satnav used to navigate across the UK.

So the presenter, Matt Parker, decided to check the math. He also chose some ancient sites and plotted them on a map and put out a response story about how the “ancient Woolworths stores” also created perfect triangles. His point was that if you have enough sites the probability of finding perfect triangles is so high that you should be very surprised if you didn’t find them. According to Parker, with the 1500 sites the original researchers were studying there were 561 million possible triangles. Statistically speaking some of them are bound to be geometrically interesting. Parker put it this way: “Mathematically we can state, ‘you can find any pattern you want, to any level of precision you want, if you’re prepared to ignore enough data’.”

YouTube Screenshot of Ancient Woolworths Stores

YouTube Screenshot of Ancient Woolworths Stores

This got me thinking about how many research findings and claims we read on a daily basis and how uncertain those findings can be if the researchers aren’t trained well enough in basic probability and statistics. But, even when the researchers are well trained, things can still go wrong. An article I read recently by FiveThirtyEight titled, Science Isn’t Broken stated that, “Stanford meta-science researcher John Ioannidis concluded, in a famous 2005 paper, that most published research findings are false.” That is shocking to think about and doesn’t lend much credence to the title of the FiveThirtyEight article! But the article goes on to say that science itself isn’t broken. The problem lies (according to the author) in the fact that, “Science is hard — really fucking hard”.

As a science, data science is no different. If statistical methods are not understood and followed properly or data is not handled properly, or worse yet fabricated, then the findings are meaningless. FiveThirtyEight shows just how easy it is to intentionally or unintentionally p-hack data, with an online tool called, “Hack Your Way To Scientific Glory” (available within the Science Isn’t Broken article) that can give you different findings based on your political leanings and what data you choose to include or exclude from the report.

Democrats are GOOD for the Economy Democrats are BAD for the Economy

This is exactly the same point made by Matt Parker, “you can find any pattern you want, to any level of precision you want, if you’re prepared to ignore enough data.” What the FiveThirtyEight article suggests however is that the majority of problems stem not from researchers who intentionally produce misleading results, but from natural human biases. The author of the FiveThirtyEight article claims that “The important lesson here is that a single analysis is not sufficient to find a definitive answer. Every result is a temporary truth, one that’s subject to change when someone else comes along to build, test and analyze anew.”

The main point of the article can probably best be summed up with this paragraph:

People often joke about the herky-jerky nature of science and health headlines in the media — coffee is good for you one day, bad the next — but that back and forth embodies exactly what the scientific process is all about. It’s hard to measure the impact of diet on health, Nosek told me. “That variation [in results] occurs because science is hard.” Isolating how coffee affects health requires lots of studies and lots of evidence, and only over time and in the course of many, many studies does the evidence start to narrow to a conclusion that’s defensible. “The variation in findings should not be seen as a threat,” Nosek said. “It means that scientists are working on a hard problem.”

So what’s my point? Am I saying that you can’t trust science or anything that you read? No. My point is that you should read everything with a critical eye and apply your knowledge of probability and statistics to everything you read before you judge it’s accuracy or credibility. Keep in mind confirmation bias and confounding variables and all the other possible causes of confusion and inaccurate analyses in a study. And never forget the Sagan standard, “extraordinary claims require extraordinary evidence”.

Published on RPubs at: http://rpubs.com/betsyrosalen/DATA621_Blog_Post_1