This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(readr)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ stringr 1.5.0
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(pwr)
data <- read.csv("C:/Users/prase/OneDrive/Documents/STATISTICS/signal_metrics.csv")
head(data)
## Timestamp Locality Latitude Longitude SignalStrength DataThroughput
## 1 51:30.7 Danapur 25.42617 85.09443 -76.72446 1.105452
## 2 23:56.4 Bankipore 25.59105 85.25081 -77.52335 2.476287
## 3 24:39.7 Ashok Rajpath 25.48233 85.14868 -78.55790 1.031408
## 4 02:26.4 Rajendra Nagar 25.46116 85.23826 -78.77064 1.461008
## 5 32:12.7 Ashok Rajpath 25.61583 85.10455 -77.27129 1.792531
## 6 58:31.2 Rajendra Nagar 25.56698 85.12149 -75.67285 2.572450
## Latency NetworkType BB60C srsRAN BladeRFxA9
## 1 138.9383 LTE -72.50342 -84.97208 -75.12779
## 2 137.6606 LTE -73.45848 -84.77590 -77.94294
## 3 165.4447 LTE -73.88210 -84.76128 -77.21692
## 4 101.6800 LTE -74.04047 -87.27312 -77.86791
## 5 177.4726 LTE -74.08004 -85.93112 -75.57369
## 6 131.5178 LTE -74.66450 -85.16332 -74.51283
str(data)
## 'data.frame': 12621 obs. of 11 variables:
## $ Timestamp : chr "51:30.7" "23:56.4" "24:39.7" "02:26.4" ...
## $ Locality : chr "Danapur" "Bankipore" "Ashok Rajpath" "Rajendra Nagar" ...
## $ Latitude : num 25.4 25.6 25.5 25.5 25.6 ...
## $ Longitude : num 85.1 85.3 85.1 85.2 85.1 ...
## $ SignalStrength: num -76.7 -77.5 -78.6 -78.8 -77.3 ...
## $ DataThroughput: num 1.11 2.48 1.03 1.46 1.79 ...
## $ Latency : num 139 138 165 102 177 ...
## $ NetworkType : chr "LTE" "LTE" "LTE" "LTE" ...
## $ BB60C : num -72.5 -73.5 -73.9 -74 -74.1 ...
## $ srsRAN : num -85 -84.8 -84.8 -87.3 -85.9 ...
## $ BladeRFxA9 : num -75.1 -77.9 -77.2 -77.9 -75.6 ...
summary(data)
## Timestamp Locality Latitude Longitude
## Length:12621 Length:12621 Min. :25.41 Min. :84.96
## Class :character Class :character 1st Qu.:25.52 1st Qu.:85.07
## Mode :character Mode :character Median :25.59 Median :85.14
## Mean :25.59 Mean :85.14
## 3rd Qu.:25.67 3rd Qu.:85.21
## Max. :25.77 Max. :85.32
## SignalStrength DataThroughput Latency NetworkType
## Min. :-116.94 Min. : 1.001 Min. : 10.02 Length:12621
## 1st Qu.: -94.88 1st Qu.: 2.492 1st Qu.: 39.96 Class :character
## Median : -91.41 Median : 6.463 Median : 75.21 Mode :character
## Mean : -91.76 Mean :20.909 Mean : 85.28
## 3rd Qu.: -88.34 3rd Qu.:31.504 3rd Qu.:125.96
## Max. : -74.64 Max. :99.986 Max. :199.99
## BB60C srsRAN BladeRFxA9
## Min. :-115.67 Min. :-124.65 Min. :-119.21
## 1st Qu.: -95.49 1st Qu.:-102.55 1st Qu.: -95.17
## Median : -91.60 Median : -98.96 Median : -91.46
## Mean : -91.77 Mean : -99.26 Mean : -91.77
## 3rd Qu.: -87.79 3rd Qu.: -95.67 3rd Qu.: -88.15
## Max. : -72.50 Max. : -81.32 Max. : -74.51
Response variable: DataThroughput
response_variable <- data$SignalStrength
summary(response_variable)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -116.94 -94.88 -91.41 -91.76 -88.34 -74.64
ggplot(data, aes(x = SignalStrength)) +
geom_histogram() +
labs(title = "Signal Strength Distribution", x = "Signal Strength")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Explanatory variable: NetworkType
Null Hypothesis (H0): There is no significant difference in mean signal strength among different network types.
Alternative Hypothesis (H1): There is a significant difference in mean signal strength among different network types.
data$ConsolidatedNetworkType <- ifelse(data$NetworkType %in% c("LTE", "4G"), "4G",
ifelse(data$NetworkType %in% c("3G", "UMTS"), "3G", "Other"))
anova_result <- aov(SignalStrength ~ ConsolidatedNetworkType, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## ConsolidatedNetworkType 1 71781 71781 3831 <2e-16 ***
## Residuals 12619 236455 19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tukey_result <- TukeyHSD(anova_result)
p_values <- tukey_result$ConsolidatedNetworkType[, 4]
alpha <- 0.05
significant_comparisons <- sum(p_values < alpha)
if (significant_comparisons > 0) {
cat("Reject the null hypothesis. There are significant differences in mean 'SignalStrength' among network types.\n")
} else {
cat("Fail to reject the null hypothesis. There are no significant differences in mean 'SignalStrength' among network types.\n")
}
## Reject the null hypothesis. There are significant differences in mean 'SignalStrength' among network types.
The ANOVA test was conducted to determine whether there is a significant difference in signal strength among different network types. The p-value obtained from the test is [p-value]. Since the p-value is less than the chosen significance level of 0.05, we reject the null hypothesis. This suggests that there is a significant difference in mean signal strength among network types.
Another Response Variable: DataThroughput
Null Hypothesis (H0): There is no linear relationship between “SignalStrength” and “DataThroughput.”
Alternative Hypothesis (H1): There is a linear relationship between “SignalStrength” and “DataThroughput.”
lm_model <- lm(SignalStrength ~ DataThroughput, data = data)
summary(lm_model)
##
## Call:
## lm(formula = SignalStrength ~ DataThroughput, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.3227 -2.9317 0.1539 3.0324 18.5925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -90.237529 0.049884 -1808.96 <2e-16 ***
## DataThroughput -0.072815 0.001423 -51.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.498 on 12619 degrees of freedom
## Multiple R-squared: 0.1718, Adjusted R-squared: 0.1717
## F-statistic: 2618 on 1 and 12619 DF, p-value: < 2.2e-16
p_value <- summary(lm_model)$coefficients["DataThroughput", "Pr(>|t|)"]
alpha <- 0.05
if (p_value < alpha) {
cat("Reject the null hypothesis. There is a significant linear relationship between DataThroughput and SignalStrength.\n")
} else {
cat("Fail to reject the null hypothesis. There is no significant linear relationship between DataThroughput and SignalStrength.\n")
}
## Reject the null hypothesis. There is a significant linear relationship between DataThroughput and SignalStrength.
The linear regression model was built to explore the relationship between “SignalStrength” and “DataThroughput.” The p-value associated with “DataThroughput” is [p-value]. Since the p-value is less than the significance level (e.g., 0.05), we reject the null hypothesis. This suggests that there is a linear relationship between “SignalStrength” and “DataThroughput.”
plot(lm_model, which = 1)
plot(lm_model, which = 2)
plot(lm_model, which = 3)
plot(lm_model, which = 4)
In this model, we’ll add “DataThroughput,” “ConsolidatedNetworkType,” and “Latitude” as predictor variables.
model <- lm(SignalStrength ~ Latitude * ConsolidatedNetworkType, data = data)
summary(model)
##
## Call:
## lm(formula = SignalStrength ~ Latitude * ConsolidatedNetworkType,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.6406 -2.9384 0.0568 2.8771 17.9980
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -69.4828 13.4406 -5.170 2.38e-07 ***
## Latitude -0.8048 0.5251 -1.533 0.125
## ConsolidatedNetworkTypeOther 29.9615 23.3590 1.283 0.200
## Latitude:ConsolidatedNetworkTypeOther -1.3686 0.9126 -1.500 0.134
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.327 on 12617 degrees of freedom
## Multiple R-squared: 0.2335, Adjusted R-squared: 0.2334
## F-statistic: 1281 on 3 and 12617 DF, p-value: < 2.2e-16
plot(model, which = 1)
plot(model, which = 2)
plot(model, which = 3)
plot(model, which = 4)
In all tasks, the significance of the insights lies in understanding the factors influencing signal strength and how they can be leveraged to optimize network performance and user experience.