Part I Setting Up, Libraries and Datasets

Interpretation: Upon importing the neccessary datasets in the script, we can analysise this further,

library("readxl")
library("ggpubr")
## Loading required package: ggplot2
DatasetA <- read_excel("/Users/sarva/Desktop/DatasetA.xlsx")

Part II Descriptive Statistics

Interpretation: Upon importing the given datasets into the environment, we have calculated the mean and standard deviation for both independant and dependent variables, IV- Study Hours , DV- Exam Score

mean(DatasetA$StudyHours)
## [1] 6.135609
sd(DatasetA$StudyHours)
## [1] 1.369224
mean(DatasetA$ExamScore)
## [1] 90.06906
sd(DatasetA$ExamScore)
## [1] 6.795224

Interpretation: After obtaining the mean and standard deviation calculations for each variable, we can see further use this for visualisation

Part III Histogram Visualisation

In this part we will visualise historgrams and check for the normality or the abnormality of the data. With the help of The Shapiro Wilk Test.

hist(DatasetA$StudyHours,
     main = "study hours" ,
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1 ,
     cex.axis = 1,
     cex.lab = 1)

hist(DatasetA$ExamScore,
     main = "examscore",
     breaks =20,
     col ="red",
     border = "black",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Interpretation: Upon visually inspecting the histogram generated by the data provided we can see both the histograms for the respective variables, Histogram containing data about study hours appears to be positively skewed, Histogram containing data about exam score appears to be linear.

Part IV - Correlational Analysis

shapiro.test(DatasetA$StudyHours)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$StudyHours
## W = 0.99388, p-value = 0.9349
shapiro.test(DatasetA$ExamScore)
## 
##  Shapiro-Wilk normality test
## 
## data:  DatasetA$ExamScore
## W = 0.96286, p-value = 0.006465
cor.test(DatasetA$StudyHours, DatasetA$ExamScore, method = "spearman" )
## Warning in cor.test.default(DatasetA$StudyHours, DatasetA$ExamScore, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  DatasetA$StudyHours and DatasetA$ExamScore
## S = 16518, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9008825

Interpretation: Normality tests are conducted to determine whether the data is Normal or Abnormal, In this case we have used the Shapiro Wilk Test to test for normality and Correlational Analysis to test whether the data is statistically significant or not. In this case for Variable “studyhours” #P-Value > 0.5, P-Value = 0.9349, For the second variable “examscore’ the p-value is 0.006465, signifying that study hours variable is normal and exam score variable is abnormal.

Part-IV ScatterPlot Visualisation

To cross reference the independent and dependent variable, we will be visualising a scatterplot diagram

ggscatter(
  DatasetA,
  x = "StudyHours",
  y = "ExamScore",
  add = "reg.line",
  xlab = "StudyHours",
  ylab = "ExamScore"
) 

Interpretation: With the help of the scatterplot diagram, we can see that the data is linear with no extreme outliers.

Reporting the results

Mean (Studyhours) =6.135609 Mean (Examscore) = 90.06906 Standard Deviation (studyhours) =1.369224 Standard Deviation (examscore) = 6.795224 rho = 0.9008825