library(readxl)
library(ggpubr)
## Loading required package: ggplot2
dataset <- read_excel("C:/Users/laksh/Desktop/r studio/dataset.xlsx")
ggscatter(
dataset,
x = "age",
y = "education",
add = "reg.line",
xlab = "age",
ylab = "education"
)
The relationship is linear. The relationship is positive. The
relationship is Strong between the variables. There are no outliers.
mean(dataset$age)
## [1] 35.32634
sd(dataset$age)
## [1] 11.45344
median(dataset$age)
## [1] 35.79811
calculating mean,median for Variable 1
mean(dataset$education)
## [1] 13.82705
sd(dataset$education)
## [1] 2.595901
median(dataset$education)
## [1] 14.02915
calculating mean,median for Variable 2
hist(dataset$age,
main = "age",
breaks = 20,
col = "lightblue",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Variable 1: Age The first variable looks normally distributed. The data
is symmetrical. The data has a proper bell curve.
hist(dataset$education,
main = "education",
breaks = 20,
col = "lightcoral",
border = "white",
cex.main = 1,
cex.axis = 1,
cex.lab = 1)
Variable 2: Education The second variable looks abnormally distributed.
The data is negatively skewed. The data doesn’t have a proper bell
curve.
shapiro.test(dataset$age)
##
## Shapiro-Wilk normality test
##
## data: dataset$age
## W = 0.99194, p-value = 0.5581
Variable 1: age The first variable is normally distributed (p = .55).
shapiro.test(dataset$education)
##
## Shapiro-Wilk normality test
##
## data: dataset$education
## W = 0.9908, p-value = 0.4385
Variable 2: Education The second variable is normally distributed (p = .43).