R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Importing libraries:

library(readr)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ stringr   1.5.0
## ✔ forcats   1.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(pwr)

Load and Explore Data:

data <- read.csv("C:/Users/prase/OneDrive/Documents/STATISTICS/signal_metrics.csv")
head(data)
##   Timestamp       Locality Latitude Longitude SignalStrength DataThroughput
## 1   51:30.7        Danapur 25.42617  85.09443      -76.72446       1.105452
## 2   23:56.4      Bankipore 25.59105  85.25081      -77.52335       2.476287
## 3   24:39.7  Ashok Rajpath 25.48233  85.14868      -78.55790       1.031408
## 4   02:26.4 Rajendra Nagar 25.46116  85.23826      -78.77064       1.461008
## 5   32:12.7  Ashok Rajpath 25.61583  85.10455      -77.27129       1.792531
## 6   58:31.2 Rajendra Nagar 25.56698  85.12149      -75.67285       2.572450
##    Latency NetworkType     BB60C    srsRAN BladeRFxA9
## 1 138.9383         LTE -72.50342 -84.97208  -75.12779
## 2 137.6606         LTE -73.45848 -84.77590  -77.94294
## 3 165.4447         LTE -73.88210 -84.76128  -77.21692
## 4 101.6800         LTE -74.04047 -87.27312  -77.86791
## 5 177.4726         LTE -74.08004 -85.93112  -75.57369
## 6 131.5178         LTE -74.66450 -85.16332  -74.51283
str(data)
## 'data.frame':    12621 obs. of  11 variables:
##  $ Timestamp     : chr  "51:30.7" "23:56.4" "24:39.7" "02:26.4" ...
##  $ Locality      : chr  "Danapur" "Bankipore" "Ashok Rajpath" "Rajendra Nagar" ...
##  $ Latitude      : num  25.4 25.6 25.5 25.5 25.6 ...
##  $ Longitude     : num  85.1 85.3 85.1 85.2 85.1 ...
##  $ SignalStrength: num  -76.7 -77.5 -78.6 -78.8 -77.3 ...
##  $ DataThroughput: num  1.11 2.48 1.03 1.46 1.79 ...
##  $ Latency       : num  139 138 165 102 177 ...
##  $ NetworkType   : chr  "LTE" "LTE" "LTE" "LTE" ...
##  $ BB60C         : num  -72.5 -73.5 -73.9 -74 -74.1 ...
##  $ srsRAN        : num  -85 -84.8 -84.8 -87.3 -85.9 ...
##  $ BladeRFxA9    : num  -75.1 -77.9 -77.2 -77.9 -75.6 ...
summary(data)
##   Timestamp           Locality            Latitude       Longitude    
##  Length:12621       Length:12621       Min.   :25.41   Min.   :84.96  
##  Class :character   Class :character   1st Qu.:25.52   1st Qu.:85.07  
##  Mode  :character   Mode  :character   Median :25.59   Median :85.14  
##                                        Mean   :25.59   Mean   :85.14  
##                                        3rd Qu.:25.67   3rd Qu.:85.21  
##                                        Max.   :25.77   Max.   :85.32  
##  SignalStrength    DataThroughput      Latency       NetworkType       
##  Min.   :-116.94   Min.   : 1.001   Min.   : 10.02   Length:12621      
##  1st Qu.: -94.88   1st Qu.: 2.492   1st Qu.: 39.96   Class :character  
##  Median : -91.41   Median : 6.463   Median : 75.21   Mode  :character  
##  Mean   : -91.76   Mean   :20.909   Mean   : 85.28                     
##  3rd Qu.: -88.34   3rd Qu.:31.504   3rd Qu.:125.96                     
##  Max.   : -74.64   Max.   :99.986   Max.   :199.99                     
##      BB60C             srsRAN          BladeRFxA9     
##  Min.   :-115.67   Min.   :-124.65   Min.   :-119.21  
##  1st Qu.: -95.49   1st Qu.:-102.55   1st Qu.: -95.17  
##  Median : -91.60   Median : -98.96   Median : -91.46  
##  Mean   : -91.77   Mean   : -99.26   Mean   : -91.77  
##  3rd Qu.: -87.79   3rd Qu.: -95.67   3rd Qu.: -88.15  
##  Max.   : -72.50   Max.   : -81.32   Max.   : -74.51

Selecting a continous column:

Response variable: DataThroughput

response_variable <- data$SignalStrength
summary(response_variable)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -116.94  -94.88  -91.41  -91.76  -88.34  -74.64
ggplot(data, aes(x = SignalStrength)) +
  geom_histogram() +
  labs(title = "Signal Strength Distribution", x = "Signal Strength")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Selecting a categorical column:

Explanatory variable: NetworkType

Null Hypothesis (H0): There is no significant difference in mean signal strength among different network types.

Alternative Hypothesis (H1): There is a significant difference in mean signal strength among different network types.

data$ConsolidatedNetworkType <- ifelse(data$NetworkType %in% c("LTE", "4G"), "4G",
                                       ifelse(data$NetworkType %in% c("3G", "UMTS"), "3G", "Other"))
anova_result <- aov(SignalStrength ~ ConsolidatedNetworkType, data = data)
summary(anova_result)
##                            Df Sum Sq Mean Sq F value Pr(>F)    
## ConsolidatedNetworkType     1  71781   71781    3831 <2e-16 ***
## Residuals               12619 236455      19                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tukey_result <- TukeyHSD(anova_result)
p_values <- tukey_result$ConsolidatedNetworkType[, 4]
alpha <- 0.05
significant_comparisons <- sum(p_values < alpha)
if (significant_comparisons > 0) {
  cat("Reject the null hypothesis. There are significant differences in mean 'SignalStrength' among network types.\n")
} else {
  cat("Fail to reject the null hypothesis. There are no significant differences in mean 'SignalStrength' among network types.\n")
}
## Reject the null hypothesis. There are significant differences in mean 'SignalStrength' among network types.

The ANOVA test was conducted to determine whether there is a significant difference in signal strength among different network types. The p-value obtained from the test is [p-value]. Since the p-value is less than the chosen significance level of 0.05, we reject the null hypothesis. This suggests that there is a significant difference in mean signal strength among network types.

Selecting another Response Variable:

Another Response Variable: DataThroughput

Null Hypothesis (H0): There is no linear relationship between “SignalStrength” and “DataThroughput.”

Alternative Hypothesis (H1): There is a linear relationship between “SignalStrength” and “DataThroughput.”

lm_model <- lm(SignalStrength ~ DataThroughput, data = data)
summary(lm_model)
## 
## Call:
## lm(formula = SignalStrength ~ DataThroughput, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.3227  -2.9317   0.1539   3.0324  18.5925 
## 
## Coefficients:
##                  Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)    -90.237529   0.049884 -1808.96   <2e-16 ***
## DataThroughput  -0.072815   0.001423   -51.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.498 on 12619 degrees of freedom
## Multiple R-squared:  0.1718, Adjusted R-squared:  0.1717 
## F-statistic:  2618 on 1 and 12619 DF,  p-value: < 2.2e-16
p_value <- summary(lm_model)$coefficients["DataThroughput", "Pr(>|t|)"]
alpha <- 0.05
if (p_value < alpha) {
  cat("Reject the null hypothesis. There is a significant linear relationship between DataThroughput and SignalStrength.\n")
} else {
  cat("Fail to reject the null hypothesis. There is no significant linear relationship between DataThroughput and SignalStrength.\n")
}
## Reject the null hypothesis. There is a significant linear relationship between DataThroughput and SignalStrength.

The linear regression model was built to explore the relationship between “SignalStrength” and “DataThroughput.” The p-value associated with “DataThroughput” is [p-value]. Since the p-value is less than the significance level (e.g., 0.05), we reject the null hypothesis. This suggests that there is a linear relationship between “SignalStrength” and “DataThroughput.”

Diagnostic plots:

plot(lm_model, which = 1)

plot(lm_model, which = 2)

plot(lm_model, which = 3)

plot(lm_model, which = 4)

Selecting One other variable into regression model:

In this model, we’ll add “DataThroughput,” “ConsolidatedNetworkType,” and “Latitude” as predictor variables.

model <- lm(SignalStrength ~ Latitude * ConsolidatedNetworkType, data = data)
summary(model)
## 
## Call:
## lm(formula = SignalStrength ~ Latitude * ConsolidatedNetworkType, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6406  -2.9384   0.0568   2.8771  17.9980 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           -69.4828    13.4406  -5.170 2.38e-07 ***
## Latitude                               -0.8048     0.5251  -1.533    0.125    
## ConsolidatedNetworkTypeOther           29.9615    23.3590   1.283    0.200    
## Latitude:ConsolidatedNetworkTypeOther  -1.3686     0.9126  -1.500    0.134    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.327 on 12617 degrees of freedom
## Multiple R-squared:  0.2335, Adjusted R-squared:  0.2334 
## F-statistic:  1281 on 3 and 12617 DF,  p-value: < 2.2e-16

Diagnostic plots:

plot(model, which = 1)

plot(model, which = 2)

plot(model, which = 3)

plot(model, which = 4)

In all tasks, the significance of the insights lies in understanding the factors influencing signal strength and how they can be leveraged to optimize network performance and user experience.