This assignment is due on 2024-03-12 (Tue) 23:59, and is out of 100 marks (10% of final grade).
Data is available on Moodle as “nmj_formation.csv”.
All statistical tests should be assessed at a significance level of 0.05.
All your code, output, and answers to questions should be inputted into this single R markdown (.Rmd) file.
All your input should be made within the code/input cells. If you wish to open new input cells, you may do so by 1) clicking “Insert” at the top of the code editor –> “Executable cell” -> “R” for R code input cells, or 2) clicking “Insert” at the top of the code editor -> “Code Block” -> “OK” for general text input.
Please rename your R markdown file as “yourUID-yourName-A2.Rmd”, and knit your Rmd file to the HTML format by clicking the “Knit” button at the top of the code editor.
You should upload the HTML file for submission to Moodle.
You are a research assistant in a laboratory specializing in molecular neuroscience research. Your senior, with an interest in neuromuscular junctions (NMJs), has recently acquired some data from cell culture and imaging work. After performing the quantification, they have passed the results (“nmj_formation.csv”) to you for statistical analysis. Answer the questions below using this dataset.
“nmj_formation.csv” has 68 independent observations and 8 variables:
Please input your student information here:
Name: Chu Chi GAi
UID: 3035928926
This step serves to set up a proper R environment. Write scripts to:
Install (if needed) and load the packages required
(dplyr, ggplot2). (3 marks)
Import the dataset “nmj_formation.csv” and assign it to a variable of your choice. (2 marks)
Reminder: You do not need to set the working directory in an R markdown file.
# Write your codes for the "initialization" section here
library(dplyr) # loading the dplyr package
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # loading the ggplot2 package
`nmj_formation.(1)` <- read.csv("~/BMED 3603 R studio/BMED 3603(1)/nmj_formation (1).csv")
We want to first perform some exploratory data analysis. Write scripts to:
(a) Display the first 20 rows of the dataset. (1 mark)
# Write your codes for Question 1a here
head(`nmj_formation.(1)`, 20)
## Condition Track_area AChR_area AChR_norm_int AChR_raw_int Thres_min
## 1 Healthy 2400 2659 7182.616 17238279 4214
## 2 Healthy 704 1020 6151.220 4330459 4136
## 3 Healthy 4436 2630 5368.273 23813661 5552
## 4 Healthy 632 1646 6491.331 4102521 4245
## 5 Healthy 1815 7705 5645.419 10246436 3791
## 6 Healthy 581 892 8943.172 5195983 3421
## 7 Healthy 1803 2077 6801.707 12263478 3698
## 8 Healthy 605 1171 6513.418 3940618 5085
## 9 Healthy 8474 4186 7755.468 65719836 3976
## 10 Healthy 1587 777 6386.356 10135147 4623
## 11 Healthy 2064 1270 5792.195 11955090 5362
## 12 Healthy 2384 2525 6448.973 15374352 6723
## 13 MG 3717 3841 4378.818 16276066 5232
## 14 MG 475 271 4740.349 2251666 5921
## 15 MG 7965 4769 4627.730 36859869 5508
## 16 MG 1944 1161 4499.120 8746290 7022
## 17 MG 1082 553 4635.850 5015990 7435
## 18 MG 1116 2487 5658.432 6314810 3993
## 19 MG 4870 2909 5391.479 26256505 5783
## 20 MG 6186 4755 4553.474 28167790 3030
## Thres_max Morphology
## 1 65535 spindle
## 2 65535 round
## 3 65535 spindle
## 4 65535 spindle
## 5 65535 round
## 6 65535 spindle
## 7 65535 round
## 8 65535 spindle
## 9 65535 branched
## 10 65535 branched
## 11 65535 branched
## 12 65535 branched
## 13 65535 round
## 14 65535 spindle
## 15 65535 spindle
## 16 65535 branched
## 17 65535 spindle
## 18 65535 round
## 19 65535 spindle
## 20 65535 spindle
(b) Determine if there are missing values in the whole dataset. Your script should output the total count of missing values. (2 marks)
# Write your codes for Question 1b here
sum(is.na(`nmj_formation.(1)`))
## [1] 0
(c) Reveal the data types for each column. (1 mark)
# Write your codes for Question 1c here
str(`nmj_formation.(1)`)
## 'data.frame': 68 obs. of 8 variables:
## $ Condition : chr "Healthy" "Healthy" "Healthy" "Healthy" ...
## $ Track_area : int 2400 704 4436 632 1815 581 1803 605 8474 1587 ...
## $ AChR_area : int 2659 1020 2630 1646 7705 892 2077 1171 4186 777 ...
## $ AChR_norm_int: num 7183 6151 5368 6491 5645 ...
## $ AChR_raw_int : int 17238279 4330459 23813661 4102521 10246436 5195983 12263478 3940618 65719836 10135147 ...
## $ Thres_min : int 4214 4136 5552 4245 3791 3421 3698 5085 3976 4623 ...
## $ Thres_max : int 65535 65535 65535 65535 65535 65535 65535 65535 65535 65535 ...
## $ Morphology : chr "spindle" "round" "spindle" "spindle" ...
(d) With reference to (c), what are
the data types for AChR_norm_int, Thres_min,
and Morphology? (3 marks)
# Write your answers for Question 1d here
AChR_norm_int: numeric ,
Thres_min: integer ,
Morphology: character
(e) Reveal the descriptive statistics (mean, median, 1st and 3rd quartiles, min, max) for each column. (1 mark)
# Write your codes for Question 1e here
summary(`nmj_formation.(1)`)
## Condition Track_area AChR_area AChR_norm_int
## Length:68 Min. : 475 Min. : 209 Min. :2202
## Class :character 1st Qu.: 1120 1st Qu.: 1137 1st Qu.:4223
## Mode :character Median : 2166 Median : 2087 Median :5137
## Mean : 3372 Mean : 2454 Mean :5191
## 3rd Qu.: 5006 3rd Qu.: 3526 3rd Qu.:6497
## Max. :13205 Max. :10299 Max. :8943
## AChR_raw_int Thres_min Thres_max Morphology
## Min. : 1679573 Min. :2033 Min. :65535 Length:68
## 1st Qu.: 6037586 1st Qu.:3370 1st Qu.:65535 Class :character
## Median :10608206 Median :4079 Median :65535 Mode :character
## Mean :16788745 Mean :4285 Mean :65535
## 3rd Qu.:24101421 3rd Qu.:5245 3rd Qu.:65535
## Max. :72082018 Max. :7435 Max. :65535
(f) Based on your answers for 1(a)-(e), do you think the dataset is ready for analysis? Explain. (2 marks)
# Write your answers for Question 1f here
I think the dataset is not ready for analysis as the data is not yet ordered by condition or morphology.
Your senior asked you to perform some data manipulation. Write scripts to:
(a) Convert the Condition and
Morphology columns into factor variables. (2 marks)
# Write your codes for Question 2a here
`nmj_formation.(1)`$Condition <- as.factor(`nmj_formation.(1)`$Condition)
`nmj_formation.(1)`$Morphology <- as.factor(`nmj_formation.(1)`$Morphology)
(b) What are the default orders for the
Condition and Morphology columns after
(a)? (2 marks)
# Write your answers for Question 2b here
Condition: "DMD" "Healthy" "MG"
Morphology: "branched" "round" "spindle"
(c) Reorder the factor levels so that the
Condition factor levels are ordered “Healthy”, “MG”, and
“DMD”, and the Morphology factor levels are ordered
“round”, “branched”, and “spindle”. You should also show proof that the
levels are now ordered in the desired sequence. (6 marks)
# Write your codes for Question 2c here
`nmj_formation.(1)`$Condition <- ordered(`nmj_formation.(1)`$Condition, levels = c("Healthy", "MG", "DMD"))
`nmj_formation.(1)`$Morphology <- ordered(`nmj_formation.(1)`$Morphology, levels = c("round","branched", "spindle"))
levels(`nmj_formation.(1)`$Condition )
## [1] "Healthy" "MG" "DMD"
levels(`nmj_formation.(1)`$Morphology )
## [1] "round" "branched" "spindle"
In order to analyse the effect of MG or DMD on the fluorescence intensity of AChR, normality should be assessed before we run the appropriate statistical test(s).
(a) Which column of AChR fluorescence intensity data
would you use to assess the effect of MG or DMD on the fluorescence
intensity of AChR, AChR_norm_int or
AChR_raw_int? Justify. (2 marks)
# Write your answers for Question 3a
AChR_norm_int, as AChR_norm_int considers AChR the fluorescence intensity per unit area of nerve-muscle contact removing the factor of measured area.
(b) Plot a density plot for your selected column of AChR fluorescence intensity data in (a). The density plot should include a title, appropriate labels for the x and y-axes. All data within the column, regardless of condition, should be presented within one density curve. (5 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support. Your codes should feature
geom_density().
# Write your codes for Question 3b here
`nmj_formation.(1)` %>% ggplot(aes(AChR_norm_int)) +
geom_density(fill = 'blue') +
labs (title = "Distribution of AChR fluorescence intensity ",
x = "AChR_norm_int",
y = "Density")
(c) Plot a histogram plot for your selected column of AChR fluorescence intensity data in (a). The histogram plot should include a title, appropriate labels for the x and y-axes, and the histograms should be coloured by condition. (5 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support. Your codes should feature
geom_histogram().
# Write your codes for Question 3c here
`nmj_formation.(1)` %>% ggplot(aes(AChR_norm_int, fill=Condition)) +
geom_histogram() +
labs (title = "Distribution of AChR fluorescence intensity ",
x = "AChR fluorescence intensity",
y = "Count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
(d) Plot a Q-Q plot for your selected column of AChR fluorescence intensity data in (a). The Q-Q plot should include a title, appropriate labels for the x and y-axes, and a Q-Q reference line. (5 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support. Your codes should feature stat_qq() and
stat_qq_line.
`nmj_formation.(1)` %>% ggplot(aes(sample =AChR_norm_int)) +
stat_qq() +
stat_qq_line() +
labs (title = "Distribution of AChR fluorescence intensity",
x = "Theoretical",
y = "Sample")
(e) Assess, with an appropriate statistical test and p-value evidence, whether your selected column of AChR fluorescence intensity data in (a) follows the normal distribution. (3 marks)
# Write your codes for Question 3e here
shapiro.test(`nmj_formation.(1)`$AChR_norm_int)
##
## Shapiro-Wilk normality test
##
## data: `nmj_formation.(1)`$AChR_norm_int
## W = 0.97076, p-value = 0.1103
# Write your answers for Question 3e here
For Shapiro-Wilk normality test,
H0:Data follows normal distribution
Ha:Data does not follow normal distribution
As p value(0.1103) is larger than 0.05, the null hypothesis cannot be rejected.
The data is normally distributed.
Your senior asked you to first assess the effect of MG or DMD on the
normalized AChR fluorescence intensity. Assuming the column data for
AChR_norm_int is normally distributed, write scripts
to:
(a) Calculate the variances of the column data for
AChR_norm_int for each condition. (3 marks)
Hint: Use var() and think about how to index
dataframes.
# Write your codes for Question 4a here
var(`nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "MG"])
## [1] 1192963
var(`nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "DMD"])
## [1] 749032.6
var(`nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "Healthy"])
## [1] 620593.5
(b) Assess, with an appropriate statistical test and p-value evidence, whether the data is of equal variances. (2 marks)
# Write your codes for Question 4b here
#var.test(`nmj_formation.(1)`$AChR_norm_int)
var.test(`nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "MG"], `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "DMD"])
##
## F test to compare two variances
##
## data: `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "MG"] and `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "DMD"]
## F = 1.5927, num df = 19, denom df = 18, p-value = 0.3288
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.618171 4.054475
## sample estimates:
## ratio of variances
## 1.592671
var.test(`nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "MG"], `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "Healthy"])
##
## F test to compare two variances
##
## data: `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "MG"] and `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "Healthy"]
## F = 1.9223, num df = 19, denom df = 28, p-value = 0.1132
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.8541493 4.6340519
## sample estimates:
## ratio of variances
## 1.922293
var.test(`nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "DMD"], `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "Healthy"])
##
## F test to compare two variances
##
## data: `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "DMD"] and `nmj_formation.(1)`$AChR_norm_int[`nmj_formation.(1)`$Condition == "Healthy"]
## F = 1.207, num df = 18, denom df = 28, p-value = 0.6389
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.531601 2.970739
## sample estimates:
## ratio of variances
## 1.206962
# Write your answers for Question 4b here
The null hypothesis is that 2 groups compared have equal variance.
The alternative hypothesis is that 2 groups compared do not have equal variance.
It has a p-value larger than 0.05, all data groups at the above have equal variance.
(c) Assuming the column data for
AChR_norm_int is of equal variances, assess, with
appropriate statistical tests and p-value evidence:
Whether the different conditions have an effect on the normalized AChR fluorescence intensity. (3 marks)
Which condition(s) have significantly affected the normalized AChR fluorescence intensity, and how the condition(s) have affected the normalized AChR fluorescence intensity. (3 marks)
# Write your codes for Question 4c here
AChR_norm_int_Condition.aov <- aov(AChR_norm_int ~ Condition, data = `nmj_formation.(1)`)
summary(AChR_norm_int_Condition.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 131060416 65530208 79.58 <2e-16 ***
## Residuals 65 53525494 823469
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(AChR_norm_int_Condition.aov)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = AChR_norm_int ~ Condition, data = `nmj_formation.(1)`)
##
## $Condition
## diff lwr upr p adj
## MG-Healthy -2395.5717 -3028.211 -1762.93208 0.0000000
## DMD-Healthy -3125.1453 -3767.563 -2482.72806 0.0000000
## DMD-MG -729.5736 -1426.863 -32.28448 0.0382685
# Write your answers for Question 4c here
One-way ANOVA can be performed.
H0: There are not significant differences between the normalized AChR fluorescence intensity.
Ha: There are significant differences between the normalized AChR fluorescence intensity.
As Pr(>F) <2e-16 (<0.05),the null hypothesis is rejected.
There are significant differences between the normalized AChR fluorescence intensity.
Then we perform Tukey’s test.
From Turkey's test, since both comparison of "Healthy and DMD"" , and "MG and Healthy " has a p value which is less than 0.05, both DMD and MG have affected the normal AChR flurescence intensity.
(d) Identify the outlier(s), if any, in the column
data for AChR_norm_int. You only need to state the row
number that corresponds to the outlier data. (2 marks)
Hint: Utilize the outputs you have obtained from (c).
# Write your codes for Question 4d here
plot(AChR_norm_int_Condition.aov, 1)
# Write your answers for Question 4d here
The row number that corresponds to the outlier data is 6,46,62.
(e) Assuming the column data for
AChR_norm_int is not normally distributed, assess, with an
appropriate statistical test and p-value evidence, whether the different
conditions have an effect on the normalized AChR fluorescence intensity.
(2 marks)
# Write your codes for Question 4e here
kruskal.test(AChR_norm_int ~ Condition, data = `nmj_formation.(1)`)
##
## Kruskal-Wallis rank sum test
##
## data: AChR_norm_int by Condition
## Kruskal-Wallis chi-squared = 49.813, df = 2, p-value = 1.525e-11
# Write your answers for Question 4e here
Kruskal-Wallis test is used.
H0: There are not significant differences between the normalized AChR fluorescence intensity.
Ha: There are significant differences between the normalized AChR fluorescence intensity.
As the p value(1.525e-11) is much less then 0.05, the null hypothesis is rejected. The different conditions have an effect on the normalized AChR fluorescence intensity.
As previous literature have also suggested a role of MG or DMD on muscle development and integrity, your senior has further asked you to assess the effect of MG or DMD on the muscle cell morphology. Write scripts to:
(a) Plot a bar graph for the Morphology
data. The bar graph should include a title, appropriate labels for the x
and y-axes, and should display the distribution of muscle cell
morphology within each condition. (3 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support. Your codes should feature
geom_bar().
# Write your codes for Question 5a here
`nmj_formation.(1)` %>% ggplot(aes(x = Condition, fill = Morphology)) +
geom_bar() +
labs (title = "The distribution of muscle cell morphology within each condition")
(b) Briefly describe two observations from the bar graph generated in (a). (2 marks)
# Write your answers for Question 5b here
Among the 3 morphology, spindle group accounts for the most common morphology for MG condition, branched morphology accounts for the most common morphology of healthy condition.
(c) Assess, with an appropriate statistical test and p-value evidence, whether MG or DMD has an effect on muscle cell morphology. (4 marks)
Hint: Think about what statistical test you should use to assess the relationship between two categorical variables.
# Write your codes for Question 5c here
chi_sq <- chisq.test(table(`nmj_formation.(1)`$Morphology, `nmj_formation.(1)`$Condition))
## Warning in chisq.test(table(`nmj_formation.(1)`$Morphology,
## `nmj_formation.(1)`$Condition)): Chi-squared approximation may be incorrect
chi_sq
##
## Pearson's Chi-squared test
##
## data: table(`nmj_formation.(1)`$Morphology, `nmj_formation.(1)`$Condition)
## X-squared = 7.3697, df = 4, p-value = 0.1176
# Write your answers for Question 5c here
As p-value= 0.1176( >0.05), no difference is found between groups of data.
(d) Your senior suggested running a Chi-square test to evaluate the effect of different conditions on muscle cell morphology. Display the observed and expected values. Is the chi-square test an appropriate test to run? Explain. (3 marks)
Hint: Think about the requirements that should be fulfilled in order for a Chi-square test to be reliably ran.
# Write your codes for Question 5d here
chi_sq <- chisq.test(table(`nmj_formation.(1)`$Morphology, `nmj_formation.(1)`$Condition))
## Warning in chisq.test(table(`nmj_formation.(1)`$Morphology,
## `nmj_formation.(1)`$Condition)): Chi-squared approximation may be incorrect
chi_sq$expected
##
## Healthy MG DMD
## round 7.25000 5.000000 4.750000
## branched 10.66176 7.352941 6.985294
## spindle 11.08824 7.647059 7.264706
# Write your answers for Question 5d here
A warning occured when the chi-square test is run, as expected values for all cells is not above 5 .
Hint: This question tests your understanding of correlation and regression.
During the fluorescence imaging sessions, your senior observed that
larger areas of nerve-muscle contact tend to have larger areas of AChR
clustering. They became interested in the possible relationships between
nerve-muscle contact area (Track_area) with 1) AChR
clustering area (AChR_area), or 2) normalized AChR
fluorescence intensity (norm_AChR_int), and have asked you
to perform the appropriate relevant analyses. Write scripts to:
(ai) Assess, using a graphical approach of your
choice, whether the data in the Track_area column follows
normal distribution. The plot should include a title and appropriate
labels for the x and y-axes. (2 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for ggplot2 support.
# Write your codes for Question 6ai here
`nmj_formation.(1)` %>% ggplot(aes(sample =Track_area)) +
stat_qq() +
stat_qq_line() +
labs (title = "Distribution of nerve-muscle contact area",
x = "Theoretical",
y = "Sample")
# Write your answers for Question 6ai here
I will use Q-Q plot to check normality. From the above plot, the data is skewed, so it does not follow normal distribution.
(aii) Assess, using a graphical approach of your
choice, whether the data in the AChR_area column follows
normal distribution. The plot should include a title and appropriate
labels for the x and y-axes. (2 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for ggplot2 support.
# Write your codes for Question 6aii here
`nmj_formation.(1)` %>% ggplot(aes(sample =AChR_area)) +
stat_qq() +
stat_qq_line() +
labs (title = "Distribution of AChR clustering area",
x = "Theoretical",
y = "Sample")
# Write your answers for Question 6aii here
I will use Q-Q plot to check normality. From the above plot, the data mostly follows the reference line and thus it follows normal distribution.
(b) After discussing your preliminary analyses in
(a) with your senior, they suggested that you to
attempt two data transformation approaches (1. Logarithmic, 2. Square
root) for the AChR_area data.
Perform the data transformation as requested by your senior, and assess the normality for each set of the transformed data using one graphical approach and one statistical approach (i.e. one graphical and one statistical for logarithmic transformation + one graphical and one statistical approach for square root transformation). The graphical plots should include a title and appropriate labels for the x and y-axes.
Which transformation approach is better? Explain in brief with reference to your graphical and statistical outputs. (10 marks)
Hint: Think about how you can perform logarithmic or square root
calculations for any numerical/integer value (e.g. 10) in RStudio, then
simply apply the same approach to the AChR_area data
column. For the graphical approach, check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support.
# Write your codes for Question 6b here
`nmj_formation.(1)` %>% ggplot(aes(AChR_area)) +
geom_density(fill = "blue", alpha = 0.4) +
labs (title = "Distribution of AChR clustering area",
x = "AChR clustering area",
y = "Density")
`nmj_formation.(1)` %>% ggplot(aes(log(AChR_area))) +
geom_density(fill = "blue", alpha = 0.4) +
labs (title = "Distribution of logarithmic of AChR clustering area",
x = "logarithmic of AChR clustering area)",
y = "Density")
# sqrt transformation with sqrt()
`nmj_formation.(1)`%>% ggplot(aes(sqrt(AChR_area))) +
geom_density(fill = "blue", alpha = 0.4) +
labs (title = "Distribution of sqrt of AChR clustering area",
x = "sqrt of AChR clustering area",
y = "Density")
`nmj_formation.(1)` %>% ggplot(aes(AChR_area^(1/3))) +
geom_density(fill = "blue", alpha = 0.4) +
labs (title = "Distribution of cube_root of AChR clustering area",
x = "cube_root of AChR clustering area",
y = "Density")
shapiro.test((`nmj_formation.(1)`$AChR_area))
##
## Shapiro-Wilk normality test
##
## data: (`nmj_formation.(1)`$AChR_area)
## W = 0.87394, p-value = 5.483e-06
shapiro.test(log(`nmj_formation.(1)`$AChR_area))
##
## Shapiro-Wilk normality test
##
## data: log(`nmj_formation.(1)`$AChR_area)
## W = 0.96657, p-value = 0.06484
shapiro.test(sqrt(`nmj_formation.(1)`$AChR_area))
##
## Shapiro-Wilk normality test
##
## data: sqrt(`nmj_formation.(1)`$AChR_area)
## W = 0.97594, p-value = 0.2115
shapiro.test((`nmj_formation.(1)`$AChR_area)^(1/3))
##
## Shapiro-Wilk normality test
##
## data: (`nmj_formation.(1)`$AChR_area)^(1/3)
## W = 0.98745, p-value = 0.7295
# Write your answers for Question 6b here
I will use cube root altered AChR clustering area as it has a W value much more near to 1.
(c) Plot a scatter plot to visualize the
relationship between Track_area (logarithmic-transformed)
and AChR_area (your preferred transformation approach from
(b)). The scatter plot should include a title,
appropriate labels for the x and y-axes, and the points should be
coloured by condition. Track_area should be on the
x-axis.
Without considering the effect of different conditions, how would you describe the relationship between the two variables? (4 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support. Your codes should feature
geom_point().
# Write your codes for Question 6c here
`nmj_formation.(1)` %>%
ggplot(aes(log(Track_area),(AChR_area)^(1/3),
color=Condition)) +
geom_point()+
labs (title = "The relationship between Track_area (logarithmic-transformed) and AChR_area(cubic-root_transformed)")
# Write your answers for Question 6c here
For all conditions, they exhibit a linear relationship in the relationship between the two variables.
(d) Plot a scatter plot to visualize the
relationship between Track_area (logarithmic-transformed)
and norm_AChR_int (no data transformation needed). The
scatter plot should include a title, appropriate labels for the x and
y-axes, and the points should be coloured by condition.
Track_area should be on the x-axis.
Without considering the effect of different conditions, how would you describe the relationship between the two variables? (4 marks)
Hint: Check http://www.sthda.com/english/wiki/ggplot2-essentials for
ggplot2 support. Your codes should feature
geom_point().
# Write your codes for Question 6d here
`nmj_formation.(1)`%>%
ggplot(aes(log(Track_area),
AChR_norm_int,
color = Condition)) +
geom_point()+ labs (title = "The relationship between Track_area (logarithmic-transformed) and norm_AChR_int")
# Write your answers for Question 6d here
For all conditions, they do not exhibit a linear relationship in the relationship between the two variables.
(e) Your senior asked you to provide quantitative evidence for the relationships you proposed in (c)-(d). Demonstrate, with the appropriate statistical analyses, that your proposed relationships are true. (4 marks)
Hint: Think about what statistical test you should use to assess the relationship between two continuous variables.
# Write your codes for Question 6e here
nmj_formation_AChR_area_cor2 <- cor.test(log(`nmj_formation.(1)`$Track_area),`nmj_formation.(1)`$AChR_norm_int, method = "pearson")
nmj_formation_AChR_area_cor1 <- cor.test((`nmj_formation.(1)`$AChR_area)^(1/3),log( `nmj_formation.(1)`$Track_area), method = "pearson")
nmj_formation_AChR_area_cor1
##
## Pearson's product-moment correlation
##
## data: (`nmj_formation.(1)`$AChR_area)^(1/3) and log(`nmj_formation.(1)`$Track_area)
## t = 9.9188, df = 66, p-value = 1.046e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6562461 0.8544464
## sample estimates:
## cor
## 0.7736255
nmj_formation_AChR_area_cor2
##
## Pearson's product-moment correlation
##
## data: log(`nmj_formation.(1)`$Track_area) and `nmj_formation.(1)`$AChR_norm_int
## t = -1.5528, df = 66, p-value = 0.1253
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.40790385 0.05306323
## sample estimates:
## cor
## -0.187737
# Write your answers for Question 6e here
Null hypothesis: the true correlation is equal to 0.
Alternative hypothesis: true correlation is not equal to 0
For The relationship between Track_area (logarithmic-transformed) and AChR_area(cubic-root_transformed):
From the pearson's test, p-value is 1.046e-14 < 0.05. Null hypothesis is therefore rejected.
Therefore, the two variables' true correlation is not equal to 0.
Yet, for the relationship between Track_area (logarithmic-transformed) and norm_AChR_int:
From the pearson's test, p-value is 0.1253 > 0.05. Null hypothesis cannot be rejected.
Therefore, the two variables' true correlation is equal to 0.
(f) Your senior would also like to predict future
values of AChR_area (your preferred transformation approach
from (b)) based on Track_area
(logarithmic-transformed). Perform the appropriate analysis and state
the equation that can be used for prediction. (2 marks)
Hint: First refer to (c) to get a preliminary understanding of the relationship between the two transformed variables. Is it linear? logarithmic? exponential?
Then think about what analysis you should perform to obtain the coefficients that build an equation. The equation should describe the said relationship (linear/logarithmic/exponential).
# Write your codes for Question 6f here
AChR_area_cube= (`nmj_formation.(1)`$AChR_area)^(1/3)
lm_AChR_area <- lm(log(Track_area)
~AChR_area_cube,
data = `nmj_formation.(1)`)
summary(lm_AChR_area)
##
## Call:
## lm(formula = log(Track_area) ~ AChR_area_cube, data = `nmj_formation.(1)`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7115 -0.2881 0.1506 0.3484 0.8902
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.16308 0.27115 19.041 < 2e-16 ***
## AChR_area_cube 0.20516 0.02068 9.919 1.05e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5542 on 66 degrees of freedom
## Multiple R-squared: 0.5985, Adjusted R-squared: 0.5924
## F-statistic: 98.38 on 1 and 66 DF, p-value: 1.046e-14
# Write your answers for Question 6f here
From the above, we can generate a linear equation of y=0.20516x+5.16308, where x is logarithmic of Track_area and y is the cubic root of AChR_area.