Part I Setting Up, Libraries and Datasets
Interpretation: Upon importing the neccessary datasets in the script, we can analysise this further,
library(readxl)
library(ggpubr)
## Loading required package: ggplot2
DatasetC <- read_excel("/Users/sarva/Desktop/DatasetC.xlsx")
Part II Descriptive Statistics After importing, we must calculate the standard deviation of each variable, IV-inches, DV- pounds
mean(DatasetC$inches)
## [1] 69.27122
sd(DatasetC$inches)
## [1] 2.738448
mean(DatasetC$pounds)
## [1] 195.7736
sd(DatasetC$pounds)
## [1] 29.0096
interpretation: after calculating the mean and standard deviation of the dependent variable and the independent variable, this data will be used for references further
Part III Histogram Visualisation
In this part we will visualise historgrams and check for the normality or the abnormality of the data. With the help of The Shapiro Wilk Test.
hist(DatasetC$inches,
main = "Inches",
breaks = 20,
col = "yellow",
border ="black",
cex.main = 1,
cex.axis = 1,
cex.lab = 1,)
hist(DatasetC$pounds,
main = "Pounds",
breaks = 20,
col = "yellow",
border ="black",
cex.main = 1,
cex.axis = 1,
cex.lab = 1,)
Interpretation: after visualising the data we can see that the histogram
for variable “inches” appears to be positvely skewed, secondly,
histogram for variable “pounds” appears to be slightly positvely skewed
as well.
Part IV - Correlational Analysis
shapiro.test(DatasetC$inches)
##
## Shapiro-Wilk normality test
##
## data: DatasetC$inches
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetC$pounds)
##
## Shapiro-Wilk normality test
##
## data: DatasetC$pounds
## W = 0.97289, p-value = 0.03691
cor.test(DatasetC$inches, DatasetC$pounds, method ="spearman")
##
## Spearman's rank correlation rho
##
## data: DatasetC$inches and DatasetC$pounds
## S = 166000, p-value = 0.9693
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.00390039
Interpretation: after running the Shapiro Wilk test for two variables, we can see the following p values for each variable. P-Value for Variable “inches” is 0.09349 and P-Value for variable “pounds” is 0.03691, with the reference of P-Value not being higher than 0.05, we can prove that data for variable “inches” is normal and data for variable “pounds” is abnormal
Interpretation: This test is conducted to test the normality of data being provided
Part V Scatterplot Visualisation
ggscatter(
DatasetC,
x = "inches",
y = "pounds",
add = "reg.line",
xlab = "inches",
ylab = "pounds"
)
Interpretation: After cross referencing the two variables, we can visually derive that the data is scattered across the scatterplot with extreme outliers across the spectrum.
Reporting the results mean (inches) =69.27122 mean (pounds) = 195.7736 Standard Deviation (inches) = 2.738448 Standard Deviation (pounds) =29.0096 R or Rho= 0.00390039