Part I Setting Up, Libraries and Datasets
Interpretation: Upon importing the neccessary datasets in the script, we can analysise this further,
library("readxl")
library("ggpubr")
## Loading required package: ggplot2
DatasetA <- read_excel("/Users/sarva/Desktop/DatasetA.xlsx")
Part II Descriptive Statistics
Interpretation: Upon importing the given datasets into the environment, we have calculated the mean and standard deviation for both independant and dependent variables, IV- Study Hours , DV- Exam Score
mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224
Interpretation: After obtaining the mean and standard deviation calculations for each variable, we can see further use this for visualisation
Part III Histogram Visualisation
In this part we will visualise historgrams and check for the normality or the abnormality of the data. With the help of The Shapiro Wilk Test.
hist(DatasetA$StudyHours,
main = "study hours" ,
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1 ,
cex.axis = 1,
cex.lab = 1)
hist(DatasetA$ExamScore,
main = "examscore",
breaks =20,
col ="red",
border = "black",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Interpretation: Upon visually inspecting the histogram generated by the data provided we can see both the histograms for the respective variables, Histogram containing data about study hours appears to be positively skewed, Histogram containing data about exam score appears to be linear.
Part IV - Correlational Analysis
shapiro.test(DatasetA$StudyHours)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
##
## Shapiro-Wilk normality test
##
## data: DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman" )
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.9008825
Interpretation: Normality tests are conducted to determine whether the data is Normal or Abnormal, In this case we have used the Shapiro Wilk Test to test for normality and Correlational Analysis to test whether the data is statistically significant or not. In this case for Variable “studyhours” #P-Value > 0.5, P-Value = 0.9349, For the second variable “examscore’ the p-value is 0.006465, signifying that study hours variable is normal and exam score variable is abnormal.
Part-IV ScatterPlot Visualisation
To cross reference the independent and dependent variable, we will be visualising a scatterplot diagram
ggscatter(
DatasetA,
x = "StudyHours",
y = "ExamScore",
add = "reg.line",
xlab = "StudyHours",
ylab = "ExamScore"
)
Interpretation: With the help of the scatterplot diagram, we can see
that the data is linear with no extreme outliers.
Reporting the results
Mean (Studyhours) =6.135609 Mean (Examscore) = 90.06906 Standard Deviation (studyhours) =1.369224 Standard Deviation (examscore) = 6.795224 rho = 0.9008825