BS1040 Stats MCQ reassessment

1 Can we use embryonic stem cells to treat heart attack (in sheep)

Does treatment using embryonic stem cells (ESCs) help improve heart function following a heart attack? Each sheep in the study was randomly assigned to the ESC or control group, and the change in their hearts’ pumping capacity was measured. A positive value corresponds to increased pumping capacity, which generally suggests a stronger recovery. This question can be analysed using a two-mean test (t-test or wilcoxon test).

1.1 Plot the data

The first thing to do is use the dfSummary command (package summarytools) to just have a quick look at the stem.cell data. The data is available in the openintro package, so you’ll need to install and load this. We first did this in session 1. We also used dfSummary back in session 1.

  1. Blackboard MCQ: Per your dfSummary, what are the two treatments (trmt) called in this study?

I think the easiest way to see if there is a change in pumping capacity is to create a new variable which calculates the difference between after and before. Use mutate (session 3) to create a new dataframe called mystem with the additional variable of the difference between after and before.

Next you want to have a visual check to see if treatment had an effect on the difference in pumping capacity. Plot this difference (y-axis) using a boxplot with trmt as the x-axis. You encountered boxplots back in session 2.

1.2 Does the data fit the assumptions of your analysis

A t-test has a number of assumptions. In the lecture, we touched upon the need for normality. Without even checking, I would suggest the small sample size will mean the data probably is not normal. So for today use the wilcoxon test (code given in the lecture) to see if there was a significant effect of ESC on the ability to recover from a heart attack.

  1. Blackboard MCQ: From your boxplot and wilcoxon test, does treatment with embryonic stem cells improve heart function following a heart attack.

2 In mammals, does body weight correlate with how long pregnancy lasts

Naively, we might assume that bigger mammals take longer in the womb to develop. In the openintro package, we have a data set available to answer this question. Here, we are going to write a script to carry out a full analysis to answer this question.

2.1 Plot the data

The first thing to do is use the dfSummary command (package summarytools) to just have a quick look at the mammals data. The data is available in the openintro package, so you’ll need to install and load this. We first did this in session 1. We also used dfSummary back in session 1.

  1. Blackboard MCQ: What is the maximum amount of sleep (TotalSleep) for a given species?

Next you want to have a visual check to see if bodyweight and gestation period are correlated. Draw a scatterplot to help you understand. We first created a scatterplot in session 2.

Looks pretty odd because there are a couple of species that are so much larger than the rest. This is where a log scale can come in handy. Use mutate (session 3) to create a new dataframe called mymammals with the additional variable of the log to the base 10 (log10) of bodyweight. Make a new scatterplot with this

  1. Blackboard MCQ: From this graph, what happens to gestation period as bodyweight increases?

2.2 Does the data fit the assumptions of your analysis

Remember to do a pearson’s correlation, data must be;

  1. both variables should be normally distributed
  2. linearity (straight line relationship between each of the two variables)

No need to check normality here, as its clear from the graph that the relationship is not linear. Remember for Spearman’s, the relationship only needs to be monotonic.

2.3 Carry out the correlation

So the Spearman rank correlation looks like the right test here. Instructions for it can be found here. Important You should use the raw data here as it makes the analysis simpler and easier to explain

Correlation coefficient is comprised between -1 and 1:

  • -1 indicates a strong negative correlation : this means that every time x increases, y decreases
  • 0 means that there is no association between the two variables (x and y)
  • 1 indicates a strong positive correlation : this means that y increases with x
  1. Blackboard MCQ What rho value did you find for your Spearman rank correlation between bodyweight and gestation?