Key terms

  • Independent Variable
  • Dependent Variable
  • Measures of Central Tendency
  • Measures of Variation
  • Histogram
  • Null Hypothesis
  • Alpha Level
  • P-value
  • Effect Sizes

independent variable aka ‘x’ variable

This is often considered the variable that influences, or causes, changes in the ‘y’ variable. Below in this faked graph, the independent variable is miles jogged

Dependent variable, outcome variable, aka ‘y’ variable

This is often considered the outcome variable because its value depends on the x value. Below in this fake graph, the dependent variable is body fat.

This is often considered ‘the effect’ when dealing with “cause effect” relationships.

So, assuming everything else equal, the more you run, the more your body fat will decrease.

Measure of Central Tendency

Mean: the arithmetic average where you sum all of the data points and divide by the total number of data points

Median: the middle score. Visually, it’s found by sorting all of the data and picking the data point where equal numbers update exist on either side. It is also known as the 50th percentile, which implies you could calculate any Percentile. For example the 25th percentile would mean that 25% of the scores are below that point while 75% is above.

Mode: the most common score. It is not uncommon to have more than one mode where we might call something by modal or trimodal as the case may be .

Measure of Variation

Measures of variation attempt to summarize the degree to which data hang around this measure of central tendency.

The most common measure of central tendency is the mean, and if you consider that average as type of gravity, then variation is an attempt to try to describe how data points orbit that average.

You can think of Saturn as the center of gravity, the arithmetic mean, and that the rings of Saturn are made up of data points (asteroids).

If the data points are really close around the average then you would have a small amount of variation; if the data points are very spread out then you expect a lot of variation.

Standard Deviation: this is the average amount of variation. Small numbers mean that there is very little variation while large numbers imply lots of variation.

Histograms

A histogram is a visual representation of counting data points. They are also known as frequency distributions. So, it’s not surprising that frequency count of data is its main purpose.

You should watch this video which is a nice gentle introduction to histograms. (StatQuest with Josh Starmer 2017)

Example measures of central tendency

The data below is a data set curated by the Personality Project (“Personality Project,” n.d.).

It represents 2800 participants in study on personality. You may remember a few weeks ago I introduced the big five model of personality.

The major five components include openness, conscientiousness, extroversion, agreeableness, neuroticism. Here are some brief stats on the demographic information for these participants:

##      gender        education         age       
##  Min.   :1.000   Min.   :1.00   Min.   : 3.00  
##  1st Qu.:1.000   1st Qu.:3.00   1st Qu.:20.00  
##  Median :2.000   Median :3.00   Median :26.00  
##  Mean   :1.672   Mean   :3.19   Mean   :28.78  
##  3rd Qu.:2.000   3rd Qu.:4.00   3rd Qu.:35.00  
##  Max.   :2.000   Max.   :5.00   Max.   :86.00  
##                  NA's   :223

Example of variation

And the standard deviation (a measure of spread around the center of gravity):

variable value
Sex 0.4696471
education 1.1077139
age 11.1275548

So, for example, age varies around the mean (28.782) about 11.128 years (i.e., the data are mostly between \(\pm\) 11.128 of 28.782)

Example histogram

So, let’s look at the histogram of age:

Not surprisingly, as most research published in psychology comes from college samples Looks like the most common score is going to be around 20. But you do notice that there’s some range that goes up into the 80’s even.

A word about normal distributions

Many of you are familiar with the bell-shaped curve that comes from probability distribution most commonly known as the normal distribution.

This particular data set of age is not normally distributed.

Imagine that I took a random sample of 35 people. And I took the average of that sample. It might look like this:

##  [1] 29 31 19 19 27 22 25 21 27 40 22 19 20 37 23 40 23 49 33 22 61 12 35 52 37
## [26] 26 20 22 40 20 23 19 26 19 22
## [1] 28.05714

Let’s do this 20 times

Several samples

Imagine we did this a thousand times but instead of showing individual histograms, we just took the average of each sample, and then treated it as a data point? We would have thousands of averages. I’ll do that 20 times

twenty averages

##  [1] 17.00000 18.50000 23.00000 23.25000 29.40000 27.66667 28.28571 25.50000
##  [9] 23.22222 39.00000 31.00000 26.00000 27.92308 30.42857 35.40000 27.68750
## [17] 31.88235 25.55556 27.00000 27.35000

These are averages.

We took 20 samples of 35 people and listed them.

Now let’s do that, but 100 times. I won’t show you the individual averages but instead will make a histogram.

histogram of a sampling distribution

Let’s do it again but with 1000 samples

Notice that it’s looking more normal.

The key about this is that when we do a real experiment, it’s like we are doing one of these little tiny samples and if we were to do another sample and another, took averages of each, we’d be getting closer to the ‘true’ average.

Also, many of you are familiar with the bell shape. It’s just this histogram with infinitesimal width on the vertical rectangles.

smaller rectangles

This is the same data but with more ‘bins’ They are getting smaller and smaller and so the jaggedness of the histogram is smoothing.

And all of this is actually about probability…the area under the curve is just counting the area of the rectangle divided by all the rectangles.

Null Hypothesis

Broadly speaking the null hypothesis is a statement or prediction that ‘X’ will not cause changes in ‘Y’. This can show up in various forms, a simple one is where two groups, a control group and one that receives an experimental medication, are believed to have the same outcomes–that the medication won’t work.

You may want to investigate why the null is the baseline prediction

Alpha

This is the number of times you are willing to be wrong when doing an experiment.

It’s more complicated than this, but if you imagine doing an experiment, where the differences between groups may not be obvious, you should be transparent about your threshold for being wrong. In other words, if I perform the same experiment over and over, how many times will I find a ‘real’ effect, and how many times will I find a “chance” effect?

In Psychology, the tradition is 5%. There is history to this that is worthy of letting go, but it’s nonetheless a common threshold. It is saying that if you did the experiment over and over, say 100 times, that 5 times out of a 100 we’d find an effect that was just random and therefore not real, or chance.

You can’t know for sure, so you say that 5 out of 100 times you would be willing to be wrong. By the way, this definition I’ve given you is just a part of what Alpha is. I have given you basically the type 1 error rate.

p-value

If the null hypothesis is true, the p-value is the probability that the found effect, or one larger, is real. It’s saying that when you find a difference between groups, that the chance of finding that difference (or one larger) has a certain probability, or a p-value—assuming that the null is true.

For example, let’s say I do a study seeing the effect of regular dancing(IV) has on life satisfaction (DV). Let’s say I have 2 groups, one group does not dancing, like no physical activity during the week while the experimental group does 1 hour of dancing 5 days for the week.

Let’s say I find that the average life satisfaction reports (the DV) are different between the two groups. Say, control =2/5; dance = 4/5, after performing the appropriate test (t-test, most likely) finding the p-value is .03. What does that mean?

P value .03

If it is the case that the null hypothesis, which means there’s no difference between the two groups, then the probability of finding a difference between the two groups (2 and 4) would be about 3x out of a 100 of such experiments.

In other words, if we did this exact same experiment 100 times and there is no Effect of dancing on life satisfaction, we would still find chance differences of 2 (4 - 2) and larger, about 3% of the time.

It’s a mouthful.

Effect Sizes

These are statistical efforts at comparing outcomes across different studies. By taking the experimental properties (sample sizes, demographics, methods), researchers will conduct a “meta-analysis” that attempts to compare similar studies and their outcomes to see if you can organize, or sort, factors of influence.

For example, in physics, on our planet gravity has the largest effect size over the speed that an object falls from a specific height. Friction from the air also impacts the speed of decent but it has a smaller “effect”.

Meta analyses

These use studies as data points. Instead of calculating an average like we just did with the age of the personality participants, you calculate a different statistic for a study. You put that statistic into a bin, and then you do it again with a different study and you do it over and over until you get several studies so that you can get the average of this statistic.

common effect size statistic

What is the statistic? There are few different versions of it but the classic example is what’s known as Cohen’s d. It’s basically just an average between two groups. So in our example above, we had an average of 2 versus an average of 4. So the effect size in this particular study would be 2/(pooled standard deviation). If you want to know more, you’ll want to look this up.

Dividing by that pooled standard deviation is a way to standardize the difference between groups. This is how we can compare across different studies.

And that is what Cohen’s d does. As a general rule you interpret and effect size like this

d = 0.20 indicates a small effect,

d = 0.50 indicates a medium effect and

d = 0.80 indicates a large effect.

Intro to stats for abnormal psych students

Big idea about research is that it is incredibly rare to perform one study/experiment and to have it change the known literature. Science is 99.99% incremental.

So, when you learn about a topic via reading journal articles (experiments) be humble about what you find.

Also, remember the 3 main parts of John Stuart Mills thoughts on Cause and effect. To establish that you need:

  1. Co-variance between variables (correlation)

  2. Temporal precedence (the cause has to come before the effect

  3. All other explanations must be ruled out.

    The last one, 3, is the hard part.

Stat book contents

This is a skeletal outline of most intro stat books:

What are numbers
How to graph
Probability
Probability Distribution broadly
Probability Distributions, specifically the Normal, central limit theorem
Z-scores
T-tests
Correlations
ANOVA
Linear Regression
Chi-Square
Effect Sizes

In abnormal psychology you see a lot of linear regressions. You also see a Effect Sizes.

To have an introductory understanding of regression, you should understand probability and Probability Distributions.

So, let’s jump into probability distributions by first looking at some graphs.

Graphing with a histogram

Histograms are useful for seeing how small sets of data “look”. It often is useful to do this for seeing where your data exists.

Imagine we wanted to discover how tall people are. Let us say I can sample 100 people, randomly, from Seattle. Here they are listed in inches

##   [1] 64.1 65.7 65.6 66.5 66.6 65.6 65.0 67.0 65.4 66.0 67.0 68.0 66.2 66.4 65.0
##  [16] 65.0 66.6 66.2 65.8 66.8 66.1 65.6 66.1 66.9 65.9 65.8 68.8 67.3 65.7 66.6
##  [31] 65.0 64.2 66.1 66.6 66.7 65.9 66.8 66.2 67.3 66.1 66.0 67.5 65.8 66.2 65.5
##  [46] 66.1 65.7 66.6 64.1 64.4 65.6 66.6 67.3 66.2 65.8 65.4 64.5 65.6 64.5 65.4
##  [61] 65.5 65.5 64.3 65.9 67.3 65.2 65.9 66.8 66.3 64.6 66.0 67.4 65.5 67.6 66.1
##  [76] 65.1 66.7 66.4 65.5 65.4 65.6 67.2 67.1 66.3 65.4 65.6 65.4 64.9 66.2 67.4
##  [91] 66.5 66.0 66.4 65.7 66.4 66.6 66.3 65.5 66.1 65.9

We can see this data more easily if we graph it. There are many ways to, but this is a classic way: The Histogram

another histogram

What you should see here is that the vertical ‘y’ axis is a frequency, a count, while the ‘x’ axis is the range of scores. So, from 64 to 64.5 inches tall, there appears to be about 7 people. If you look at our original data, you should see this is true:

sorted heights

##   [1] 64.1 64.1 64.2 64.3 64.4 64.5 64.5 64.6 64.9 65.0 65.0 65.0 65.0 65.1 65.2
##  [16] 65.4 65.4 65.4 65.4 65.4 65.4 65.5 65.5 65.5 65.5 65.5 65.5 65.6 65.6 65.6
##  [31] 65.6 65.6 65.6 65.6 65.7 65.7 65.7 65.7 65.8 65.8 65.8 65.8 65.9 65.9 65.9
##  [46] 65.9 65.9 66.0 66.0 66.0 66.0 66.1 66.1 66.1 66.1 66.1 66.1 66.1 66.2 66.2
##  [61] 66.2 66.2 66.2 66.2 66.3 66.3 66.3 66.4 66.4 66.4 66.4 66.5 66.5 66.6 66.6
##  [76] 66.6 66.6 66.6 66.6 66.6 66.7 66.7 66.8 66.8 66.8 66.9 67.0 67.0 67.1 67.2
##  [91] 67.3 67.3 67.3 67.3 67.4 67.4 67.5 67.6 68.0 68.8

And here is the data pulled out so you aren’t overtaxing your eyes: 64.1, 64.1, 64.2, 64.3, 64.4, 64.5, 64.5

But what I want you to really see is the shape of this histogram. It is beginning to look like a bell curve, the classic normal distribution. If we could sample a bit larger number of people, say 200, you would a shape more close to this normal curve:

And what about 1000 people?

I hope you see where this is going. Because then the following makes some sense:

The area under the curve on the right is the probability of those values on the x axis occurring.

To understand why this is, you’ll want to take calculus (at least I and II) to understand the Fundamental Theorem of Calculus.

area under curve or in the bins

Back to stats, and for example, look back at the histogram of 100 people. We can use the the number of people between 64 and 64.5 inches. In this case there are 7, and since there are 100 people, you can do the quick calculation that 7 out of 100 is 0.07. If you were to ask the question “what is the chance that someone is between 64 and 64.5 inches tall in our sample”, you would say about 7% .

The point here is that if you know something about a sample’s distribution, you can start making some good guesses about the larger population. And, this is the cornerstone for inferential stats, and p-values are suppose to guide us in this work.

Big leap to correlations

I’m going to load a data set that has 231 cases, people, where they have given data about their personality. I’ve no idea about who these people are, though I imagine there is some information online. You can probably recognize some of these variables below..

datafilename <- "http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data"
person.data  <- read.table(datafilename,header=TRUE)  #read the data file
str(person.data)
## 'data.frame':    231 obs. of  13 variables:
##  $ epiE    : int  18 16 6 12 14 6 15 18 15 8 ...
##  $ epiS    : int  10 8 1 6 6 4 9 9 11 5 ...
##  $ epiImp  : int  7 5 3 4 5 2 4 7 3 2 ...
##  $ epilie  : int  3 1 2 3 3 5 3 2 3 2 ...
##  $ epiNeur : int  9 12 5 15 2 15 12 10 1 10 ...
##  $ bfagree : int  138 101 143 104 115 110 109 92 127 74 ...
##  $ bfcon   : int  96 99 118 106 102 113 58 57 108 100 ...
##  $ bfext   : int  141 107 38 64 103 61 99 94 108 61 ...
##  $ bfneur  : int  51 116 68 114 86 54 55 72 35 87 ...
##  $ bfopen  : int  138 132 90 101 118 149 110 114 86 89 ...
##  $ bdi     : int  1 7 4 8 8 5 7 0 0 7 ...
##  $ traitanx: int  24 41 37 54 39 51 40 32 22 35 ...
##  $ stateanx: int  22 40 44 40 67 38 32 41 26 31 ...

more Big 5 data for correlations

We’ve talked about neuroticism in the context of the Big 5 OCEAN but not of the PEN, a different trait theory about personality. The N in PEN stands for neuroticism, and so let’s learn about the correlation between these two measures of neuroticism.

First, it helps to plot the data. On the x-axis we’ll put the big 5 Neuroticism and the y-axis will have the PEN.

Positive correlation

This plot shows the data leaning, yes? This is visually describing a “positive” correlation: when one variable moves, the other variable moves in the same direction.

If they moved in opposite directions, one goes up, the other down, you’d have a negative correlation. If the two variables were unrelated, then the correlation would look much like a big scatter of data points with no obvious trend.

A correlation is a numerical representation of these trends. It exists as a number between -1 … 0 … +1. For this data set, the correlation happens to be 0.63

You can also visualize a correlation as the best fitting line of this data. And this is basically what linear regression is–though once you get more than 2 variables it’s not the simple.

Linear Regression

You may remember in past math courses something about the best fitting line. You more likely remember one of the formulas for a line is \(y=mx+b\).

Well, a regression is sort of like finding the variable ‘m’. It’s not the same thing, but if you accept that there are mathematical techniques to find a ‘best fitting line’, well, this would be a linear regression. If you want to know what ‘best fitting’ is, you’ll need to take some stat classes, and to really ‘get it’ some calculus (using derivatives to find a minimum).

But here is that line for these two variables:

You should notice the line doesn’t fit over every single data point. What it is trying to do is minimize the overall difference scores from the line to each of the data points. There are an infinite number of lines that could be drawn over this set of data, but this line has special meaning.

linear regression by hand?

You can do these calculations by hand with small data sets and when you using just 2 variables. More variables requires linear algebra, and the more variables you add, the time it takes to solve by hand is exponential. Literally.

Yay for computers. Cuz the fast.

The best fitting line

This best fitting line represents the line where the distance of it from each of the data points the smallest compared to all other lines.

This sort of line often implies a causal relationship, or at least that one variable ‘X’ impacts the other variable ‘Y’. Maybe it’s causal, maybe there are other variables that are doing the ‘causing’. Maybe it’s dumb random chance. You’ve heard the saying that correlation doesn’t equal causation? Well, that’s a thing to worry about when evaluating studies (see John Stuart Mill above). (By the way, technically it’s “correlation doesn’t imply causation”.)

Linear Regression output

When you see a paper cite a linear regression the output will look something like this:

summary(Neurot.lm.scaled)
## 
## Call:
## lm(formula = scale(epiNeur) ~ scale(bfneur), data = person.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.02910 -0.58395 -0.05169  0.50306  2.52564 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.371e-16  5.134e-02     0.0        1    
## scale(bfneur) 6.275e-01  5.145e-02    12.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7803 on 229 degrees of freedom
## Multiple R-squared:  0.3937, Adjusted R-squared:  0.3911 
## F-statistic: 148.7 on 1 and 229 DF,  p-value: < 2.2e-16

There is a lot here, but the key number here is the “beta” Estimate for bfneur, 0.62747, which basically says that as x moves one unit, y will move 0.62747 units up. You should notice that this is the same as the correlation we found before.

That means that if we standardize scores for 2 variables, their correlation will be the same as their regression (called beta) coefficient. This only works for 2 varialbes.

Regression output matching the video

The unscaled regression, like in the video is 0.13175

regression of variable against itself

Imagine a scenario when you took the same variable and plotted it against itself:

It’s a straight line. Now let’s add a little variation and run a regression to see how big the Beta coefficient gets. I’m just trying to make the two data sets not equal without losing the correlation.

another regression variable against self

summary output

## 
## Call:
## lm(formula = bfneur.noise ~ person.data$bfneur)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2616 -1.9761  0.1273  2.1929  8.4392 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -1.077221   0.815082  -1.322    0.188    
## person.data$bfneur  1.011355   0.008957 112.912   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.17 on 229 degrees of freedom
## Multiple R-squared:  0.9824, Adjusted R-squared:  0.9823 
## F-statistic: 1.275e+04 on 1 and 229 DF,  p-value: < 2.2e-16

The summary output shows 1.01136 for the beta coefficient. Basically a unit move of 1 for every move of 1. Beta coefficients can be larger than 1 whereas correlations must be between -1 and +1.

To see how correlations are related to the beta coefficient, and if you understand what a standard deviation is, here is the conversion, which only works for any x & y pairing. Multiple X’s won’t work this way.

Stat Name Statistic
Correlation 0.6274718
Standard Dev for PEN Neur 4.8998362
Standard Dev for b5 Neur 23.3361809
Cor $ (SD.pen / SD.bf) 0.1317486

So, a linear regression is a process to find a best fitting line over data and for 2 variables gives you a visual image of a correlation. You can have many variables when doing this and such a process will show you which variables have more or less influence over the outcome variable.

Return of the P value

So to try and give some intuition about a p-value, I often use coin-flipping as an example.

Imagine a fair coin, where getting Heads or Tails is equally likely. 50/50.

If you flipped the coin 10 times, would you expect to get 5 of each side? Would you be upset if it were 6 to 4? What if you flipped the coin 1,000 times? Would you get 500 Heads? I can simulate this. Imagine that heads = ‘1’ and tails =‘0’.

coin_set<- rbinom(1000,1,.5)
hist(coin_set,xlab='outcome value, 0 or 1',main='counts of 0s and 1s')
abline(h=sum(coin_set),col='red')

Above you should see there are 510 heads.

is it fair?

Seems like a fair coin, but notice that if we did this over and over we wouldn’t always get 50% heads/tails. Let’s do this again with a smaller trial, 10 coin flips, but then let’s do this 1000 times. So, I’ll do 1000 simulations of 10 coin flips, and so will have 1000 counts of 10 coin flips.

lots of coin flips

d <- matrix(data=replicate(1000,rbinom(10,1,.5)),ncol=10)
head(apply(d,1,sum),10)
##  [1] 4 7 5 3 5 5 9 5 6 1
mean(head(apply(d,1,sum),10))  #average of these first 10
## [1] 5

So, what you are seeing here is the first 10 simulations, and each number represents the sum of “heads” in ten coin tosses. You should see that I’m just showing you 10 of these 1000 summations.

First simulation of coins

The first sum is 4. That means of those 10 flips, 4 of them were ‘heads’. If I plot the whole 1000 simulations, you will see a nice bell curve, with most of these coin flips being between 3 and 6:

meaning of a P-value

In this case the mean of these flips is denoted by the blue line, and so probably reflects your intuition about coin toss. But notice the huge range of outcomes.

Even though the average of flipping all these coins is very close to 5 out of 10, there were some trials where you got 2 heads, and some cases 9 heads.

And yet this is coming from a “fair” coin. it is absolutely flipping coins with a .5 chance but there is random variation.

p-value continued

p-values tell you how likely you would find a difference assuming the null is true and in this case, you can see that even with a fair coin, it’s not impossible to get very rare outcomes. If you flipped a coin 10 times and got 2 heads, you could very well assume that it’s still a fair coin.

And that is the problem with doing research. If you find a difference between 2 groups, is that difference real? The P-value is saying that assuming the groups are the same, what is the chance that I’d get a rare event?

p value of coins flips

As an example in coin terms, I’m asking, under the assumption of a fair coin (null hypothesis) I have an equal chance of heads or tails, what is the chance that I get 8 or more heads? The p-value gives you the chance of 8, 9, and 10 heads (remember the ‘more extreme’ phrase). And so, you just have to add up all the 10 coin sets that had 8 or more heads.

simulation of coins for p value

## [1] 0.053

And when you do that, you find that there are, out of the 1000 simulations, 53 of the simulations had 8 or more heads. And if you divide that number by the number of simulations, you get your the p-value for this little experiment, and it happens to be 0.053.

This is saying that if I flip a coin a bunch of times, 5.3%. of the trials will have 8 or more heads.

a real test to get same result

I can prove this by running a real statistical test using this software, called the binomial test, and in the code below you should see that this software takes the successes (number heads) and failures (the number of tails), and calculates a p-value, which you should see at the bottom of the read out. I hope my calculation above matches! You may notice the p-value in the read out. That’s calculating how likely getting 8 or more heads is coming from a fair coin. Similar question, but a different hypothesis.

heads8 <- length(coin_totals[coin_totals>7])
tails8 <- 1000-length(coin_totals[coin_totals>7])
binom.test(x=c(heads8,tails8))
## 
##  Exact binomial test
## 
## data:  c(heads8, tails8)
## number of successes = 53, number of trials = 1000, p-value < 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.03994925 0.06875518
## sample estimates:
## probability of success 
##                  0.053

References

“Personality Project.” n.d. Journal of Personality. https://www.personality-project.org/.

StatQuest with Josh Starmer. 2017. “StatQuest: Histograms, Clearly Explained.”