R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(readr)
library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ stringr   1.5.0
## ✔ forcats   1.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(pwr)
library(effsize)

Data Exploration

my_data <- read.csv("C:/Users/prase/OneDrive/Documents/STATISTICS/signal_metrics.csv")

summary(my_data)

##   Timestamp           Locality            Latitude       Longitude    
##  Length:12621       Length:12621       Min.   :25.41   Min.   :84.96  
##  Class :character   Class :character   1st Qu.:25.52   1st Qu.:85.07  
##  Mode  :character   Mode  :character   Median :25.59   Median :85.14  
##                                        Mean   :25.59   Mean   :85.14  
##                                        3rd Qu.:25.67   3rd Qu.:85.21  
##                                        Max.   :25.77   Max.   :85.32  
##  SignalStrength    DataThroughput      Latency       NetworkType       
##  Min.   :-116.94   Min.   : 1.001   Min.   : 10.02   Length:12621      
##  1st Qu.: -94.88   1st Qu.: 2.492   1st Qu.: 39.96   Class :character  
##  Median : -91.41   Median : 6.463   Median : 75.21   Mode  :character  
##  Mean   : -91.76   Mean   :20.909   Mean   : 85.28                     
##  3rd Qu.: -88.34   3rd Qu.:31.504   3rd Qu.:125.96                     
##  Max.   : -74.64   Max.   :99.986   Max.   :199.99                     
##      BB60C             srsRAN          BladeRFxA9     
##  Min.   :-115.67   Min.   :-124.65   Min.   :-119.21  
##  1st Qu.: -95.49   1st Qu.:-102.55   1st Qu.: -95.17  
##  Median : -91.60   Median : -98.96   Median : -91.46  
##  Mean   : -91.77   Mean   : -99.26   Mean   : -91.77  
##  3rd Qu.: -87.79   3rd Qu.: -95.67   3rd Qu.: -88.15  
##  Max.   : -72.50   Max.   : -81.32   Max.   : -74.51

head(data)

##                                                                             
## 1 function (..., list = character(), package = NULL, lib.loc = NULL,        
## 2     verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE) 
## 3 {                                                                         
## 4     fileExt <- function(x) {                                              
## 5         db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)                     
## 6         ans <- sub(".*\\\\.", "", x)

Sample Size Calculation

total_rows <- nrow(my_data)
sample_size <- round(0.5 * total_rows)

set.seed(2)
rand_sample_1<-sample(1:total_rows,sample_size,replace=T)

set.seed(4)
rand_sample_2<-sample(1:total_rows,sample_size,replace=T)

Hypothesis

Hypothesis 1

H0: There is no significant difference between mean of datathroughput_1 and datathroughput_2.

H1: There is a significant difference between mean of datathroughput_1 and datathroughput_2.

I have extracted two random samples from the DataThroughput column from data set.

df_1<- data.frame(my_data[rand_sample_1,])
df_2<- data.frame(my_data[rand_sample_2,])

datathroughput_1 <-df_1$DataThroughput
datathroughput_2 <-df_2$DataThroughput

summary(datathroughput_1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.002   2.471   6.399  20.546  31.052  99.986

summary(datathroughput_2)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.001   2.572   6.443  21.195  31.496  99.887

Alpha Value (Significance Level) is 0.05 (5%). It corresponds to a 5% chance of making a Type I error

Power is 0.50 (50%). It is the probability of correctly rejecting the null hypothesis when it’s false. It also means we have an 50% chance of detecting a true effect if it exists.

Minimum Effect Size is 0.50. It represents the smallest effect size that we consider practically significant.

t_test_result <- t.test(datathroughput_1, datathroughput_2)
alpha <- 0.05
power <- 0.5
effect_size <- 0.5
critical_t_value <- qnorm(1 - alpha / 2)
observed_t_value <- t_test_result$statistic
p_value <- t_test_result$p.value
cat("P-value:", p_value, "\n")

## P-value: 0.1939871

if (abs(observed_t_value) <= critical_t_value) {
  cat("Fail to reject the null hypothesis (H0).\n")
  cat("There is a significant difference between datathroughput_1 and datathroughput_2.")
} else {
  cat("Reject the null hypothesis (H0).\n")
  cat("There is no significant difference between datathroughput_1 and datathroughput_2.")
}

## Fail to reject the null hypothesis (H0).
## There is a significant difference between datathroughput_1 and datathroughput_2.

alpha <- 0.05
power <- 0.5
min_effect_size <- 0.5
variance_datathroughput_1 <- var(datathroughput_1)
variance_datathroughput_2 <- var(datathroughput_2)
f_statistic <- variance_datathroughput_1 / variance_datathroughput_2
df1 <- length(datathroughput_1) - 1
df2 <- length(datathroughput_2) - 1
p_value <- pf(f_statistic, df1, df2, lower.tail = FALSE)
cat("Degrees of freedom (DF1, DF2):", df1, df2, "\n")

## Degrees of freedom (DF1, DF2): 6309 6309

cat("Fisher's F-test p-value:", p_value, "\n")

## Fisher's F-test p-value: 0.975473

if (p_value < alpha) {
  cat("Reject the null hypothesis \n")
} else {
  cat("Fail to reject the null hypothesis \n")
}

## Fail to reject the null hypothesis

I performed two different hypothesis tests i.e., Neyman-Pearson hypothesis test and Fisher’s style test. Both Failed to reject the null hypothesis. My null hypothesis is “There is no significant difference between mean of datathroughput_1 and datathroughput_2”. The two samples are having the mean 20.546 and 21.195. Hence Failing to reject the null hypothesis states that both samples are having approximately same mean.

Hypothesis 2

H0: There is no significant difference between mean of latency_1 and latency_2.

H1: There is a significant difference between mean of latency_1 and latency_2.

I have extracted two random samples from the Latency column from data set.

latency_1 <-df_1$Latency
latency_2 <-df_2$Latency

summary(latency_1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.03   40.81   74.99   85.78  127.21  199.99

summary(latency_2)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.04   39.91   74.02   84.39  124.71  199.97

Alpha Value (Significance Level) is 0.05 (5%). It corresponds to a 5% chance of making a Type I error

Power is 0.50 (50%). It is the probability of correctly rejecting the null hypothesis when it’s false.It also means we have an 50% chance of detecting a true effect if it exists.

Minimum Effect Size is 0.50. It represents the smallest effect size that we consider practically significant.

t_test_result <- t.test(latency_1, latency_2)
alpha <- 0.05
power <- 0.5
effect_size <- 0.5
critical_t_value <- qnorm(1 - alpha / 2)
observed_t_value <- t_test_result$statistic
p_value <- t_test_result$p.value
cat("P-value:", p_value, "\n")

## P-value: 0.1461691

if (abs(observed_t_value) <= critical_t_value) {
  cat("Fail to reject the null hypothesis (H0).\n")
  cat("There is a significant difference between latency_1 and latency_2.")
} else {
  cat("Reject the null hypothesis (H0).\n")
  cat("There is no significant difference between latency_1 and latency_2.")
}

## Fail to reject the null hypothesis (H0).
## There is a significant difference between latency_1 and latency_2.

alpha <- 0.05
power <- 0.5
min_effect_size <- 0.5
variance_latency_1 <- var(latency_1)
variance_latency_2 <- var(latency_2)
f_statistic <- variance_latency_1 / variance_latency_2
df1 <- length(latency_1) - 1
df2 <- length(latency_2) - 1
p_value <- pf(f_statistic, df1, df2, lower.tail = FALSE)
cat("Degrees of freedom (DF1, DF2):", df1, df2, "\n")

## Degrees of freedom (DF1, DF2): 6309 6309

cat("Fisher's F-test p-value:", p_value, "\n")

## Fisher's F-test p-value: 0.2057765

if (p_value < alpha) {
  cat("Reject the null hypothesis \n")
} else {
  cat("Fail to reject the null hypothesis \n")
}

## Fail to reject the null hypothesis

I performed two different hypothesis tests i.e., Neyman-Pearson hypothesis test and Fisher’s style test. Both Failed to reject the null hypothesis. My null hypothesis is “There is no significant difference between mean of latency_1 and latency_2”.The two samples are having the mean 85.78 and 84.39. Hence Failing to reject the null hypothesis states that both samples are having approximately same mean.

Visualization

ggplot(data = data.frame(Group = rep(c("DataThroughput 1", "DataThroughput 2"), each = length(datathroughput_1)), DataThroughput = c(datathroughput_1, datathroughput_2))) +
  geom_boxplot(aes(x = Group, y = DataThroughput, fill = Group)) +
  labs(title = "Box Plot for DataThroughput Comparison",
       x = "Groups",
       y = "DataThroughput") +
  theme_bw()

ggplot(data = data.frame(Group = rep(c("Latency 1", "Latency 2"), each = length(latency_1)), Latency = c(latency_1, latency_2))) +
  geom_boxplot(aes(x = Group, y = Latency, fill = Group)) +
  labs(title = "Box Plot for Latency Comparison",
       x = "Groups",
       y = "Latency") +
  theme_bw()

The above two box plots i.e., DataThroughput and Latency are showing similar central tendencies and spreads, they support the statistical hypothesis test’ results that failed to reject the null hypothesis. It suggests that there is no significant difference between the means of samples of two columns. I can also conclude that samples in my dataset are having approximately same statistical values i.e. Mean.

data_dive_week_7

Yagna Praseeda Atmuri

2023-10-09