First considerations

You first need a clear idea of what you want to do.

  • ‘How do I do analyse the practical in R?’ is not a meaningful question.
  • Before embarking on any analysis, you need to clearly formulate the question that you seek to answer.
  • Only then can you think about how to translate it into R (or any other software).

All analyses boil down to a handful of plots and tests*

  • Compare one continuous variable \(y\) between two groups:
    box-plot | \(t\)-test or Wilcoxon test
  • Compare one continuous variable \(y\) between >two groups:
    box-plot | ANOVA or Kruskal-Wallis test
  • Relationship between two continuous variables \(x,y\):
    scatter-plot | Pearson or Spearman correlation test
  • Dependency of continuous \(y\) on continuous \(x\):
    depends…: scatter-plot, mean \(\bar{y}\pm \mathrm{SD}\) | regression

*) for undergraduate practicals anyway

Getting the data into R

Read the data into R and see what the structure is. E.g., for the tibial block experiment (and fictive data),

Exp2 <- read.csv(file = "TestData.csv")
head(Exp2)
  X      best     block
1 1 20.112260  4.984088
2 2 14.726384  6.453256
3 3 14.989512 18.839831
4 4 12.571759  7.476011
5 5  9.598989  7.906212
6 6 22.615174 19.844811

This shows that the information whether a jump was with tibia blocked or not is encoded in ‘wide’ format: see the companion HTML guide about ‘long’ vs ‘wide’ data. If you keep them wide, a simple boxplot would be

boxplot(Exp2$best, Exp2$block)

Comparing \(y\) between 2 groups

Example: Does blocking the tibia affect max. jumping distance?

To consider: Are the observations paired — i.e., were the unblocked and blocked distances obtained from the same individual locusts?

  • No: unpaired \(t\) or Wilcoxon test
wilcox.test(Exp2$best, Exp2$block, paired = FALSE)
  • Yes: paired \(t\) or Wilcoxon test
wilcox.test(Exp2$best, Exp2$block, paired = TRUE)

If you like to use with(), you’d say

with(Exp2, wilcox.test(best, block, [...])

Twitch data: continuous \(x\)?

Should \(x\) be treated as continuous or as categorical?

  • Inter-pulse interval is clearly continuous: could have been anything >0, not just what we chose to test.
  • Number of pulses is discrete (1, 2, 3…) — but not really categorical.
  • Joint angle is — ? [you tell me!]

From R perspective:

  • boxplot() works only if \(x\) is (treated as) categorical.
  • ggplot() allows you to plot medians and quartiles for groups along a continuous \(x\).

Deciding on how to plot

  • You need to decide what kind of plot is most suitable for making the points that you want to make in your analysis.
  • You should justify the choices you make.
  • There is nothing wrong about using a boxplot with ‘equally spaced boxes’ (categorical \(x\)) when the groups are not evenly spaced along \(x\).
  • But you need to be aware that the plot does not convey this particular aspect of the data (the unequal intervals along \(x\)).

Twitch data: categorical \(x\) axis

A boxplot of amplitude over IP interval is simple enough…

boxplot(amplitude ~ factor(IP.intv),
        data = Twitch.IP,
        xlab = "IP interval (ms)",
        ylab = "amplitude (mm)")

…but it doesn’t convey the uneven spacing along \(x\).

This has fewer values in \(x\) than your real data, but you get the idea!

Twitch data: continuous \(x\) axis

To get continuous \(x\), you can

  • use the ggplot2 library instead of base R plotting functions,
  • and the stat_summary() function to plot the medians and quartiles (that would make up a box plot) as bars.

Don’t forget to explain the bars in the legend!

library(ggplot2)
p <- ggplot(Twitch.IP, aes(x=IP.intv, y=amp)) +
    stat_summary(fun = median,
                 fun.min = function(x) quantile(x, 0.25),
                 fun.max = function(x) quantile(x, 0.75),
                 geom = "pointrange", size=1.5, shape=18) +
  geom_point(size=3, alpha=0.3)

Here we are storing the plot in p without actual plotting for now…

# ...and add the axes here:
p + xlab("IP interval (ms)") + ylab("twitch amplitude (mm)")

You could also use log-scale \(x\), thus:

p + xlab("IP interval (ms)") + ylab("twitch amplitude (mm)") +
  coord_trans(x="log10")

Or, different flavour of log axis:

p + xlab("IP interval (ms)") + ylab("twitch amplitude (mm)") +
  scale_x_continuous(trans="log10", breaks = c(1, 2.5, 5, 7.5, 10, 25, 50, 75, 100))