Assign 1:Q1

library(readxl)
library(ggpubr)

## Loading required package: ggplot2

dataset <- read_excel("C:/Users/laksh/Desktop/r studio/dataset.xlsx")

ggscatter(
  dataset,
  x = "age",
  y = "education",
  add = "reg.line",
  xlab = "age",
  ylab = "education"
)

The relationship is linear. The relationship is positive. The relationship is Strong between the variables. There are no outliers.

mean(dataset$age)

## [1] 35.32634

sd(dataset$age)

## [1] 11.45344

median(dataset$age)

## [1] 35.79811

calculating mean,median for Variable 1

mean(dataset$education)

## [1] 13.82705

sd(dataset$education)

## [1] 2.595901

median(dataset$education)

## [1] 14.02915

calculating mean,median for Variable 2

hist(dataset$age,
     main = "age",
     breaks = 20,
     col = "lightblue",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Variable 1: Age The first variable looks normally distributed. The data is symmetrical. The data has a proper bell curve.

hist(dataset$education,
     main = "education",
     breaks = 20,
     col = "lightcoral",
     border = "white",
     cex.main = 1,
     cex.axis = 1,
     cex.lab = 1)

Variable 2: Education The second variable looks abnormally distributed. The data is negatively skewed. The data doesn’t have a proper bell curve.

shapiro.test(dataset$age)

## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$age
## W = 0.99194, p-value = 0.5581

Variable 1: age The first variable is normally distributed (p = .55).

shapiro.test(dataset$education)

## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$education
## W = 0.9908, p-value = 0.4385

Variable 2: Education The second variable is normally distributed (p = .43).

Assign 1:Q1

Alekhya

2026-04-08