This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(readr)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ stringr 1.5.0
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(pwr)
library(effsize)
my_data <- read.csv("C:/Users/prase/OneDrive/Documents/STATISTICS/signal_metrics.csv")
summary(my_data)
## Timestamp Locality Latitude Longitude
## Length:12621 Length:12621 Min. :25.41 Min. :84.96
## Class :character Class :character 1st Qu.:25.52 1st Qu.:85.07
## Mode :character Mode :character Median :25.59 Median :85.14
## Mean :25.59 Mean :85.14
## 3rd Qu.:25.67 3rd Qu.:85.21
## Max. :25.77 Max. :85.32
## SignalStrength DataThroughput Latency NetworkType
## Min. :-116.94 Min. : 1.001 Min. : 10.02 Length:12621
## 1st Qu.: -94.88 1st Qu.: 2.492 1st Qu.: 39.96 Class :character
## Median : -91.41 Median : 6.463 Median : 75.21 Mode :character
## Mean : -91.76 Mean :20.909 Mean : 85.28
## 3rd Qu.: -88.34 3rd Qu.:31.504 3rd Qu.:125.96
## Max. : -74.64 Max. :99.986 Max. :199.99
## BB60C srsRAN BladeRFxA9
## Min. :-115.67 Min. :-124.65 Min. :-119.21
## 1st Qu.: -95.49 1st Qu.:-102.55 1st Qu.: -95.17
## Median : -91.60 Median : -98.96 Median : -91.46
## Mean : -91.77 Mean : -99.26 Mean : -91.77
## 3rd Qu.: -87.79 3rd Qu.: -95.67 3rd Qu.: -88.15
## Max. : -72.50 Max. : -81.32 Max. : -74.51
head(data)
##
## 1 function (..., list = character(), package = NULL, lib.loc = NULL,
## 2 verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
## 3 {
## 4 fileExt <- function(x) {
## 5 db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)
## 6 ans <- sub(".*\\\\.", "", x)
total_rows <- nrow(my_data)
sample_size <- round(0.5 * total_rows)
set.seed(2)
rand_sample_1<-sample(1:total_rows,sample_size,replace=T)
set.seed(4)
rand_sample_2<-sample(1:total_rows,sample_size,replace=T)
H0: There is no significant difference between mean of datathroughput_1 and datathroughput_2.
H1: There is a significant difference between mean of datathroughput_1 and datathroughput_2.
I have extracted two random samples from the DataThroughput column from data set.
df_1<- data.frame(my_data[rand_sample_1,])
df_2<- data.frame(my_data[rand_sample_2,])
datathroughput_1 <-df_1$DataThroughput
datathroughput_2 <-df_2$DataThroughput
summary(datathroughput_1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.002 2.471 6.399 20.546 31.052 99.986
summary(datathroughput_2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.001 2.572 6.443 21.195 31.496 99.887
Alpha Value (Significance Level) is 0.05 (5%). It corresponds to a 5% chance of making a Type I error
Power is 0.50 (50%). It is the probability of correctly rejecting the null hypothesis when it’s false. It also means we have an 50% chance of detecting a true effect if it exists.
Minimum Effect Size is 0.50. It represents the smallest effect size that we consider practically significant.
t_test_result <- t.test(datathroughput_1, datathroughput_2)
alpha <- 0.05
power <- 0.5
effect_size <- 0.5
critical_t_value <- qnorm(1 - alpha / 2)
observed_t_value <- t_test_result$statistic
p_value <- t_test_result$p.value
cat("P-value:", p_value, "\n")
## P-value: 0.1939871
if (abs(observed_t_value) <= critical_t_value) {
cat("Fail to reject the null hypothesis (H0).\n")
cat("There is a significant difference between datathroughput_1 and datathroughput_2.")
} else {
cat("Reject the null hypothesis (H0).\n")
cat("There is no significant difference between datathroughput_1 and datathroughput_2.")
}
## Fail to reject the null hypothesis (H0).
## There is a significant difference between datathroughput_1 and datathroughput_2.
alpha <- 0.05
power <- 0.5
min_effect_size <- 0.5
variance_datathroughput_1 <- var(datathroughput_1)
variance_datathroughput_2 <- var(datathroughput_2)
f_statistic <- variance_datathroughput_1 / variance_datathroughput_2
df1 <- length(datathroughput_1) - 1
df2 <- length(datathroughput_2) - 1
p_value <- pf(f_statistic, df1, df2, lower.tail = FALSE)
cat("Degrees of freedom (DF1, DF2):", df1, df2, "\n")
## Degrees of freedom (DF1, DF2): 6309 6309
cat("Fisher's F-test p-value:", p_value, "\n")
## Fisher's F-test p-value: 0.975473
if (p_value < alpha) {
cat("Reject the null hypothesis \n")
} else {
cat("Fail to reject the null hypothesis \n")
}
## Fail to reject the null hypothesis
I performed two different hypothesis tests i.e., Neyman-Pearson hypothesis test and Fisher’s style test. Both Failed to reject the null hypothesis. My null hypothesis is “There is no significant difference between mean of datathroughput_1 and datathroughput_2”. The two samples are having the mean 20.546 and 21.195. Hence Failing to reject the null hypothesis states that both samples are having approximately same mean.
H0: There is no significant difference between mean of latency_1 and latency_2.
H1: There is a significant difference between mean of latency_1 and latency_2.
I have extracted two random samples from the Latency column from data set.
latency_1 <-df_1$Latency
latency_2 <-df_2$Latency
summary(latency_1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.03 40.81 74.99 85.78 127.21 199.99
summary(latency_2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.04 39.91 74.02 84.39 124.71 199.97
Alpha Value (Significance Level) is 0.05 (5%). It corresponds to a 5% chance of making a Type I error
Power is 0.50 (50%). It is the probability of correctly rejecting the null hypothesis when it’s false.It also means we have an 50% chance of detecting a true effect if it exists.
Minimum Effect Size is 0.50. It represents the smallest effect size that we consider practically significant.
t_test_result <- t.test(latency_1, latency_2)
alpha <- 0.05
power <- 0.5
effect_size <- 0.5
critical_t_value <- qnorm(1 - alpha / 2)
observed_t_value <- t_test_result$statistic
p_value <- t_test_result$p.value
cat("P-value:", p_value, "\n")
## P-value: 0.1461691
if (abs(observed_t_value) <= critical_t_value) {
cat("Fail to reject the null hypothesis (H0).\n")
cat("There is a significant difference between latency_1 and latency_2.")
} else {
cat("Reject the null hypothesis (H0).\n")
cat("There is no significant difference between latency_1 and latency_2.")
}
## Fail to reject the null hypothesis (H0).
## There is a significant difference between latency_1 and latency_2.
alpha <- 0.05
power <- 0.5
min_effect_size <- 0.5
variance_latency_1 <- var(latency_1)
variance_latency_2 <- var(latency_2)
f_statistic <- variance_latency_1 / variance_latency_2
df1 <- length(latency_1) - 1
df2 <- length(latency_2) - 1
p_value <- pf(f_statistic, df1, df2, lower.tail = FALSE)
cat("Degrees of freedom (DF1, DF2):", df1, df2, "\n")
## Degrees of freedom (DF1, DF2): 6309 6309
cat("Fisher's F-test p-value:", p_value, "\n")
## Fisher's F-test p-value: 0.2057765
if (p_value < alpha) {
cat("Reject the null hypothesis \n")
} else {
cat("Fail to reject the null hypothesis \n")
}
## Fail to reject the null hypothesis
I performed two different hypothesis tests i.e., Neyman-Pearson hypothesis test and Fisher’s style test. Both Failed to reject the null hypothesis. My null hypothesis is “There is no significant difference between mean of latency_1 and latency_2”.The two samples are having the mean 85.78 and 84.39. Hence Failing to reject the null hypothesis states that both samples are having approximately same mean.
ggplot(data = data.frame(Group = rep(c("DataThroughput 1", "DataThroughput 2"), each = length(datathroughput_1)), DataThroughput = c(datathroughput_1, datathroughput_2))) +
geom_boxplot(aes(x = Group, y = DataThroughput, fill = Group)) +
labs(title = "Box Plot for DataThroughput Comparison",
x = "Groups",
y = "DataThroughput") +
theme_bw()
ggplot(data = data.frame(Group = rep(c("Latency 1", "Latency 2"), each = length(latency_1)), Latency = c(latency_1, latency_2))) +
geom_boxplot(aes(x = Group, y = Latency, fill = Group)) +
labs(title = "Box Plot for Latency Comparison",
x = "Groups",
y = "Latency") +
theme_bw()
The above two box plots i.e., DataThroughput and Latency are showing similar central tendencies and spreads, they support the statistical hypothesis test’ results that failed to reject the null hypothesis. It suggests that there is no significant difference between the means of samples of two columns. I can also conclude that samples in my dataset are having approximately same statistical values i.e. Mean.