SR-M13 week 4 assignment 2026

This assignment is marked out of 40. It constitutes 8% of the total module mark.

Name: insert your name here

Student number: insert here


PART A - DATA ANALYSIS

Datasets for analysis

You will use a data set of counter movement jump measurements.

cmj = read.csv("CMJ_data.csv")

#### IF LOADING AN RSTUDIO DATA SPACE USE THIS COMMAND ###########
# load("CMJ_data.Rdata")

Q1. CORRELATION
Create correlation plots of concentric impulse vs. weight and relative peak power vs. weight. Present the plots in 2 rows - 1 column format and set the plot size to be 4 (width) x 8 (height) inches.

Calculate the Pearson correlation in each case and add the value as annotation on the plots (quote to 2 d.p.). [6 MARKS]

Q2. Interpret the correlation plots from Q1. and so provide an explanation for what the data tells us about the relationships between concentric impulse and relative peak power with body weight. [4 marks]

Put your answer here…….

Q3. PRINCIPAL COMPONENTS ANALYSIS
Create a PCA plot for the CMJ data. On your plot indicate the row-number of each participant.
(NOTE: The jumper mass is not a jump variable and should be removed from the dataset before performing the PCA.) [6 marks]

What % of the variance is explained by the PCA 1 and PCA 2 axes? [2 marks]

Q4. OUTLIERS
Participants 83, 87 & 107 are outliers. For each of these indicate the single most influential jump variable that makes them different to the rest of the cohort. [3 marks]

Q5. STATISTICS
For participant 107 calculate the probability that their jump height is NOT statistically different to the rest of the cohort.
MAKE SURE TO DETAIL ANY ASSUMPTIONS MADE AND JUSTIFY YOUR ANSWER. [3 marks]



PART B - QUESTIONS ON COURSE CONTENT

Make sure to show all workings in your answers (do not just quote a number).

Q6. DATA REGRESSION
In a data regression when the sum of squared residuals (SSR) equals the total sum of square errors (SST), this indicates that the y-variable has no dependence upon the x-variable.
Explain why this is so….[4 marks]

Q7. TIME-SERIES DATA
During a running stride the foot-strike phase lasts for 150-300 ms. What is the minimum sensor data-acquisition frequency required to capture accurate data for the foot-strike duration? [4 marks]

Q8. MULTIVARIATE TECHNIQUES
Why is the loss of data variance when performing PCA an issue? [4 marks]

Q9. MACHINE LEARNING
In a drug-screening test, 1,000 athletes are tested for a banned substance. The true-positive rate of the test is 0.95. Assume that in a random sample the average percentage of athletes taking the banned substance is 1%.
Athlete X tests positive, what is the probability that they have actually used the banned substance? [4 marks]