# Load required packages
library(readxl)
library(psych)
# Load dataset
dataset <- read_excel("C:\\Users\\navya\\Downloads\\A5RQ1.xlsx")
# Display descriptive statistics for both variables
describe(dataset[, c("Minutes", "Drinks")])
## vars n mean sd median trimmed mad min max range skew kurtosis
## Minutes 1 461 29.89 18.63 24.4 26.99 15.12 10 154.2 144.2 1.79 5.20
## Drinks 2 461 3.00 1.95 3.0 2.75 1.48 0 17.0 17.0 1.78 6.46
## se
## Minutes 0.87
## Drinks 0.09
# Histogram for checking the normality
hist(dataset$Minutes,
main = "Histogram of Minutes",
xlab = "Value",
ylab = "Frequency",
col = "lightblue",
border = "black",
breaks = 20)
hist(dataset$Drinks,
main = "Histogram of Drinks",
xlab = "Value",
ylab = "Frequency",
col = "lightgreen",
border = "black",
breaks = 20)
## Review of Histograms
# ........................................................
# QUESTION
# Answer the questions below as comments within the R script:
# Q1) Check the SKEWNESS of the VARIABLE 1 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# It appears symmetrical because it is uniformly distributed across the range, it does not lean left ot right.
# Q2) Check the KURTOSIS of the VARIABLE 1 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram for V1 is too flat(platykurtic).It does not show a bell curve shape.
# Q3) Check the SKEWNESS of the VARIABLE 2 histogram. In your opinion, does the histogram look symmetrical, positively skewed, or negatively skewed?
# It appears mostly symmetrical, with a slight positive skew because a small line extending towards high values.
# Q4) Check the KUROTSIS of the VARIABLE 2 histogram. In your opinion, does the histogram look too flat, too tall, or does it have a proper bell curve?
# The histogram for variable 2 has a proper bell shaped curve, it is like normal distribution.
# ........................................................
## Shapiro-Wilk tests for normality
shapiro.test(dataset$Minutes)
##
## Shapiro-Wilk normality test
##
## data: dataset$Minutes
## W = 0.84706, p-value < 2.2e-16
shapiro.test(dataset$Drinks)
##
## Shapiro-Wilk normality test
##
## data: dataset$Drinks
## W = 0.85487, p-value < 2.2e-16
# .........................................................
# QUESTION
# Answer the questions below as a comment within the R script:
# Was the data normally distributed for Variable 1?
#Answer: Based on the shapiro test p=0.89,v1 was normally distributed.
# Was the data normally distributed for Variable 2?
# Based on the shapiro test p=0.99,v2 was normally distributed.
# If the data is normal for both variables, continue with the Pearson Correlation test.
# Change to Spearman Correlation test because one or both of variables are NOT normal.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(ggpubr)
# Scatterplot to visually show the relationship between two continuous variables.
ggscatter(dataset, x = "Minutes", y = "Drinks",
add = "reg.line",
conf.int = TRUE,
cor.coef = TRUE,
cor.method = "pearson",
xlab = "Minutes", ylab = "Drinks")
# ........................................................
# QUESTION
# Answer the questions below as a comment within the R script:
# Is the relationship positive (line pointing up), negative (line pointing down), or is there no relationship (line is flat)?
# The relationship between Antivirus and Laptop is positive(line pointing up),because the regression goes up(R= 0.92).
# Decision: Both Shapiro-Wilk p-values were > .05, so we use Pearson.
cor.test(dataset$Minutes, dataset$Drinks, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: dataset$Minutes and dataset$Drinks
## t = 68.326, df = 459, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9452363 0.9617123
## sample estimates:
## cor
## 0.9541922
#A Spearman correlation was conducted to examine the relationship between time spent and number of drinks purchased. Number of customers (n = 461). There was a statistically significant correlation between time spent (mean = 29.89, sd = 18.63) and number of drinks (mean = 3.00, sd = 1.95). The correlation was positive and strong, ρ(459) = 0.92, p < 0.05. As time spent in the shop increases, number of drinks purchased also increases.