Why Study Statistics? Why Use R to do it?

Two Key Ideas


  • Statistical Analysis
  • The software to do it
  • The two ideas are separate but connected:

+ =

  • The oven bakes the cake mixture
  • Need both to get result but
    • The cake mix can’t cook things
    • You can’t eat the oven !!
  • They are distinct things, but connected

Software \(\leftrightarrow\) Statistics

  • Cake mix \(\leftrightarrow\) Statistical method
  • Oven \(\leftrightarrow\) Statistical Software
  • Cake \(\leftrightarrow\) Result of analysis
  • … and beyond the pictures above
  • Decoration on cake \(\leftrightarrow\) Your interpretation

Why Statistical Analysis

Why bother with it?

  • Why do you do statistics?
    • Why don’t researchers just use common sense?

BUT

  • Is it really plausible to think that a “common sense” approach is very trustworthy?

The Belief Bias Effect Situation A

  • A valid argument where the conclusion is believable:
    • No cigarettes are inexpensive (Premise 1)
    • Some addictive things are inexpensive (Premise 2)
    • Therefore, some addictive things are not cigarettes (Conclusion A)

The Belief Bias Effect Situation B

  • A valid argument where the conclusion is less believable:
    • Cataracts are more prevalent in elderly people (Premise 1)
    • Cigarette smoking reduces life expectancy (Premise 2)
    • Therefore at a population level higher smoking rates correlate with lower incidence of cataracts (Conclusion B)

Commentary

Both arguments are valid in terms of consistency. However, in the second argument, there are good reasons to think that the conclusion is incorrect - smoking is bad for you, right? However, the conclusion is a logical consequence of the premises.

The Belief Bias Effect Situation C

  • An invalid argument that has a believable conclusion:
    • No addictive things are inexpensive (Premise 1)
    • Some cigarettes are inexpensive (Premise 2)
    • Therefore, some addictive things are not cigarettes (Conclusion C)
  • Conclusion is true, but doesn’t follow from premises 1 and 2 alone.

The Belief Bias Effect Situation D

  • An invalid argument with an unbelievable conclusion:
    • No cigarettes are inexpensive (Premise 1)
    • Some addictive things are inexpensive (Premise 2)
    • Therefore, some cigarettes are not addictive (Conclusion D)
  • Conclusion isn’t true, and also does not follow from premises 1 and 2 alone.

In an Ideal World

  • If common sense was a reliable guide
Conclusion ‘feels’ true (A/C) Conclusion ‘feels’ False (B/D)
Argument valid (A/B) 100% of people say ‘valid’ 100% of people say ‘valid’
Argument invalid (C/D) 0% of people say ‘valid’ 0% of people say ‘valid’

But

  • An actual study of this by Evans, Barston, and Pollard1 gave this result
Conclusion ‘feels’ true (A/C) Conclusion ‘feels’ False (B/D)
Argument valid (A/B) 92% of people say ‘valid’ 46% of people say ‘valid’
Argument invalid (C/D) 92% of people say ‘valid’ 8% of people say ‘valid’

What does this show?

  • People presented with a correct argument that contradicts pre-existing beliefs find it pretty hard to even perceive it to be valid (only 46% of the time).
  • People presented with a wrong argument that agrees with pre-existing biases, rarely see that the argument is not valid (people in the study got that wrong 92% of the time!)
  • TL/DR2 People have a tendency to ‘believe what they want to believe’ regardless of underlying logic.

Commentary

It’s just too easy for us to “believe what we want to believe”; so if we want to believe in the research data instead, we’re going to need a bit of help to keep our personal biases under control. That’s what statistics does: it helps keep us consistent.

NB - Thanks to Danielle Navarro and Emily Kothe for some material here - see https://learningstatisticswithr.com/book/index.html (in particular chapter 1)

Other Reasons

  • The argument above isn’t the only one, but I personally think it is a compelling one.
  • Also tendency to see patterns in random data
  • Use tests as to whether the patterns could have occurred at random
  • Visualising data
    • See trends
    • Identify unusual observations

Why R?

Broad set of reasons (NB This section less philosophical!)

ITS FREE

  • Works on a variety of operating systems (Mac/Windows/Linux)
  • Good graphics
  • Extensible
  • Programmable

ITS FREE

Problems with Spreadsheets

  • Doing statistics in a spreadsheet (e.g. MS Excel) is generally a bad idea in the long run.
  • OK for entering data but…
    • Very limited in terms of what analyses they allow you do.
    • Graphics not good for (social) scientific work
    • More business-oriented graphics than statistical plots
    • eg No density plots such as this:

Proprietory Stats Software v. R
(ie paid-for licence in some form)

  • Avoiding proprietary software is a very good idea!
  • Some of it is good, but very expensive
  • Open source alternatives exist
    • Mainly R and Python
    • R was specifically designed for stats
    • Although Python also good (I sometimes teach it)
  • Open source also makes ‘under the bonnet’ code used open to scrutiny

Extensibility

  • R can load packages which extend its functionality
  • These are often also written in R. Examples include:
    • sf - added geographical data handling
    • tmap - interactive map drawing

Reproducibility

  • R is a programming language
  • Thus, analyses files of scripted procedures
  • This is useful, as you have a precise record of what you have done
    • Your analysis is then open to scrutiny and checking
    • You can refer back to it yourself
    • Easy to modify if you need to do a similar analysis later
    • Easy to share with others

Practicalities

Getting R set Up

RStudio

Actually Doing Some R

Conclusion

💡 New ideas

  • New general ideas
    • why do statistics at all?
  • New techniques
    • why use R to do statistics?
  • Practical issues
    • Installing R and RStudio
  • Next lecture - Exploring data with graphics