library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.3.2
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.2
## corrplot 0.92 loaded
library(hexbin)
## Warning: package 'hexbin' was built under R version 4.3.2
data <- read.csv("C:/Users/prase/OneDrive/Documents/STATISTICS/signal_metrics.csv")
head(data)
## Timestamp Locality Latitude Longitude SignalStrength DataThroughput
## 1 51:30.7 Danapur 25.42617 85.09443 -76.72446 1.105452
## 2 23:56.4 Bankipore 25.59105 85.25081 -77.52335 2.476287
## 3 24:39.7 Ashok Rajpath 25.48233 85.14868 -78.55790 1.031408
## 4 02:26.4 Rajendra Nagar 25.46116 85.23826 -78.77064 1.461008
## 5 32:12.7 Ashok Rajpath 25.61583 85.10455 -77.27129 1.792531
## 6 58:31.2 Rajendra Nagar 25.56698 85.12149 -75.67285 2.572450
## Latency NetworkType BB60C srsRAN BladeRFxA9
## 1 138.9383 LTE -72.50342 -84.97208 -75.12779
## 2 137.6606 LTE -73.45848 -84.77590 -77.94294
## 3 165.4447 LTE -73.88210 -84.76128 -77.21692
## 4 101.6800 LTE -74.04047 -87.27312 -77.86791
## 5 177.4726 LTE -74.08004 -85.93112 -75.57369
## 6 131.5178 LTE -74.66450 -85.16332 -74.51283
str(data)
## 'data.frame': 12621 obs. of 11 variables:
## $ Timestamp : chr "51:30.7" "23:56.4" "24:39.7" "02:26.4" ...
## $ Locality : chr "Danapur" "Bankipore" "Ashok Rajpath" "Rajendra Nagar" ...
## $ Latitude : num 25.4 25.6 25.5 25.5 25.6 ...
## $ Longitude : num 85.1 85.3 85.1 85.2 85.1 ...
## $ SignalStrength: num -76.7 -77.5 -78.6 -78.8 -77.3 ...
## $ DataThroughput: num 1.11 2.48 1.03 1.46 1.79 ...
## $ Latency : num 139 138 165 102 177 ...
## $ NetworkType : chr "LTE" "LTE" "LTE" "LTE" ...
## $ BB60C : num -72.5 -73.5 -73.9 -74 -74.1 ...
## $ srsRAN : num -85 -84.8 -84.8 -87.3 -85.9 ...
## $ BladeRFxA9 : num -75.1 -77.9 -77.2 -77.9 -75.6 ...
summary(data)
## Timestamp Locality Latitude Longitude
## Length:12621 Length:12621 Min. :25.41 Min. :84.96
## Class :character Class :character 1st Qu.:25.52 1st Qu.:85.07
## Mode :character Mode :character Median :25.59 Median :85.14
## Mean :25.59 Mean :85.14
## 3rd Qu.:25.67 3rd Qu.:85.21
## Max. :25.77 Max. :85.32
## SignalStrength DataThroughput Latency NetworkType
## Min. :-116.94 Min. : 1.001 Min. : 10.02 Length:12621
## 1st Qu.: -94.88 1st Qu.: 2.492 1st Qu.: 39.96 Class :character
## Median : -91.41 Median : 6.463 Median : 75.21 Mode :character
## Mean : -91.76 Mean :20.909 Mean : 85.28
## 3rd Qu.: -88.34 3rd Qu.:31.504 3rd Qu.:125.96
## Max. : -74.64 Max. :99.986 Max. :199.99
## BB60C srsRAN BladeRFxA9
## Min. :-115.67 Min. :-124.65 Min. :-119.21
## 1st Qu.: -95.49 1st Qu.:-102.55 1st Qu.: -95.17
## Median : -91.60 Median : -98.96 Median : -91.46
## Mean : -91.77 Mean : -99.26 Mean : -91.77
## 3rd Qu.: -87.79 3rd Qu.: -95.67 3rd Qu.: -88.15
## Max. : -72.50 Max. : -81.32 Max. : -74.51
missing_values <- colSums(is.na(data))
missing_values <- missing_values[missing_values > 0]
if(length(missing_values) > 0) {
cat("Columns with missing values:\n")
print(missing_values)
} else {
cat("No missing values in the dataset.\n")
}
## No missing values in the dataset.
unique_values <- lapply(data[, c("Locality", "NetworkType")], unique)
unique_values
## $Locality
## [1] "Danapur" "Bankipore" "Ashok Rajpath"
## [4] "Rajendra Nagar" "Anisabad" "Fraser Road"
## [7] "Anandpuri" "Kumhrar" "Pataliputra"
## [10] "Gardanibagh" "Bailey Road" "Exhibition Road"
## [13] "Gandhi Maidan" "Boring Canal Road" "Patliputra Colony"
## [16] "Kankarbagh" "Boring Road" "Phulwari Sharif"
## [19] "S.K. Puri" "Kidwaipuri"
##
## $NetworkType
## [1] "LTE" "4G" "5G"
In an era where seamless connectivity is integral to our daily lives, understanding and optimizing network performance is paramount. The primary aim of the analysis is to discern patterns and optimize network performance. By scrutinizing signal strength, data throughput, and latency metrics across diverse localities, the goal is to empower stakeholders, including telecommunication providers and urban planners, with actionable insights.
The analysis seeks to identify areas of suboptimal performance, enabling proactive measures to enhance service quality and guide strategic urban development. Additionally, the exploration of specific network metrics, such as BB60C, srsRAN, and BladeRFxA9, aims to contribute to technological advancements and research in the broader field of networking, fostering innovation and efficiency in telecommunications systems.
cor_matrix <- cor(data[, c("SignalStrength", "DataThroughput", "Latency", "BB60C", "srsRAN", "BladeRFxA9")])
cor_matrix
## SignalStrength DataThroughput Latency BB60C srsRAN
## SignalStrength 1.0000000 -0.4144952 0.3586899 0.8626835 0.9596663
## DataThroughput -0.4144952 1.0000000 -0.6497991 -0.3586318 -0.3965633
## Latency 0.3586899 -0.6497991 1.0000000 0.3104784 0.3404826
## BB60C 0.8626835 -0.3586318 0.3104784 1.0000000 0.8275234
## srsRAN 0.9596663 -0.3965633 0.3404826 0.8275234 1.0000000
## BladeRFxA9 0.9436984 -0.3918506 0.3376630 0.8162431 0.9045379
## BladeRFxA9
## SignalStrength 0.9436984
## DataThroughput -0.3918506
## Latency 0.3376630
## BB60C 0.8162431
## srsRAN 0.9045379
## BladeRFxA9 1.0000000
Insight: This matrix provides correlation coefficients between different variables. Positive or negative values indicate the direction and strength of the relationships between pairs of variables.
ggplot(data, aes(x = SignalStrength)) +
geom_histogram(binwidth = 2, fill = "skyblue", color = "black") +
labs(title = "Distribution of Signal Strength", x = "Signal Strength (dB)")
The x-axis (horizontal axis) represents the signal strength in dB, and the y-axis (vertical axis) represents the count of occurrences within the dataset for each bin of signal strength. The bins are shown as bars, and the height of each bar indicates the frequency of signal strength measurements that fall within that range.
The distribution of signal strength seems to be roughly bell-shaped, indicating that most of the signal strength measurements are clustered around a central range. The peak of the histogram appears to be somewhere between -100 dB and -90 dB, which suggests that this is the most common range of signal strength measured in this dataset.
In general, for wireless signals, a higher dB value (closer to 0 dB) indicates a stronger signal, while a lower dB value (further into the negative) indicates a weaker signal. However, the specific meaning of these values can vary depending on the context and the specific technology being measured. For example, in cellular networks, a signal strength of -50 dB is considered to be excellent, while -110 dB may be considered poor.
Without additional context about the dataset and what constitutes a “good” or “bad” signal strength, it’s difficult to make further interpretations. However, this histogram provides a visual representation of how signal strength varies across different measurements within the dataset.
Insight: This histogram provides a visual representation of the distribution of Signal Strength. You can observe the frequency and concentration of signal strength values. It helps in identifying the predominant signal strength levels and potential outliers.
plot(data$Longitude, data$Latitude, main = "Geographical Distribution of Localities", xlab = "Longitude", ylab = "Latitude")
The x-axis (horizontal axis) represents the longitude, and the y-axis (vertical axis) represents the latitude. Each point on the graph corresponds to a specific locality with its own latitude and longitude.
The plot shows a dense clustering of localities across a specific range of latitudes and longitudes. The distribution is somewhat elliptical in shape, with a higher concentration of localities in the central areas, tapering off towards the edges. This could suggest a higher population density or a greater number of recorded data points in the center of the distribution.
The localities are spread across a longitudinal range from approximately 84.95 to 85.30 and a latitudinal range from about 25.45 to 26.75. The exact location of these coordinates would need to be cross-referenced with a map or a geographic database to determine the specific region being represented. However, the data does not seem to cover an extremely wide area geographically, indicating that the dataset might be focused on a specific city or region.
This type of plot is often used in geographical information systems (GIS) and by network analysts to visualize the coverage or distribution of network measurements, population densities, or service areas within a given geographic region.
Insight: This scatter plot displays the geographical distribution of localities based on latitude and longitude. It helps in visualizing the spatial arrangement of the data points and identifying any geographical patterns or clusters.
boxplot(data$Latency ~ data$NetworkType, main = "Latency by Network Type", xlab = "Network Type", ylab = "Latency (ms)")
The bottom and top of the box are the first and third quartiles, so the box spans the interquartile range. The line in the middle of the box is the median, which is the middle value when the numbers are put in order. The “whiskers” extend from the quartiles to the minimum and maximum values, excluding outliers. Any points outside of the whiskers could be considered as outliers (although this plot does not appear to show any).
4G Box Plot: Shows a wider interquartile range (i.e., the box is taller), indicating more variability in latency measurements for 4G. The median latency is higher compared to 5G but lower than LTE.
5G Box Plot: Shows a smaller interquartile range, suggesting that latency is more consistent across measurements for 5G. It also has the lowest median latency, which is expected as 5G technology is designed to offer lower latency.
LTE Box Plot: Shows a somewhat wider interquartile range, similar to 4G, with the highest median latency among the three.
From this plot, we can deduce that 5G networks tend to have lower and more consistent latency compared to 4G and LTE networks. LTE appears to have the highest latency. This data is important for network engineers and consumers, as lower latency is crucial for a better user experience, particularly for real-time applications like video calls, gaming, and other interactive services.
Insight: This boxplot compares the distribution of Latency across different Network Types. It helps in understanding the central tendency, spread, and potential outliers in Latency for each network type.
plot(data$SignalStrength, data$DataThroughput, main = "Data Throughput vs Signal Strength", xlab = "Signal Strength (dB)", ylab = "Data Throughput (Mbps)")
The x-axis (horizontal axis) represents signal strength in dB. In wireless communications, signal strength is often expressed in negative dBm values, where a value closer to 0 represents a stronger signal.
The y-axis (vertical axis) represents data throughput in Mbps, which is a measure of how much data is successfully transferred from one place to another in a given amount of time.
The scatter plot shows a wide spread of data points, indicating various measurements of data throughput at different signal strengths. The dense clustering of points at the lower end of the signal strength (more negative dB values) with low data throughput suggests that weaker signals are associated with lower data rates. As the signal strength increases (values becoming less negative), the data throughput also seems to increase, indicated by the spread of points that go higher on the y-axis.
However, there is not a clear linear relationship; the data points are spread out, which implies that other factors besides signal strength might be affecting data throughput. For instance, network congestion, interference, and the type of technology used (e.g., 4G, LTE) could also impact the data rates.
This type of analysis is useful for network performance evaluation, where understanding the relationship between signal strength and throughput can help in optimizing network coverage and capacity.
Insight: This scatter plot shows the relationship between Data Throughput and Signal Strength. It helps in identifying any trends or patterns and understanding how the two variables are related.
data$Timestamp <- as.POSIXct(data$Timestamp, format = "%M:%S.%OS")
ggplot(data, aes(x = Timestamp, y = SignalStrength)) +
geom_line() +
labs(title = "Time Series Analysis of Signal Strength", x = "Timestamp", y = "Signal Strength (dB)")
The x-axis (horizontal axis) represents time, indicated by timestamps. Although the exact time frame is not clear due to the resolution of the image, it looks like the data might be recorded over the course of an hour, with markers at 15-minute intervals.
The y-axis (vertical axis) represents signal strength in dB. As with most wireless signals, the signal strength values are negative, and a value closer to 0 represents a stronger signal.
The plot shows significant fluctuations in signal strength over time, as indicated by the vertical lines, which could represent the range of signal strength values recorded in each time interval. These lines may denote the minimum and maximum signal strength recorded during each interval, or they could be error bars indicating the variability in the signal strength measurements.
The signal strength varies widely, from above -80 dB to below -110 dB, which indicates a highly variable signal environment. Such variability could be due to many factors like physical obstructions, interference from other signals, or the changing position of the receiver relative to the signal source.
This kind of analysis is crucial for understanding the stability of a network signal over time, and it can be particularly important in scenarios where a stable connection is required, such as in communication networks, mobile phone services, or other wireless technologies.
Insight: This time series plot tracks the changes in Signal Strength over time. It helps in identifying trends, seasonality, or any patterns in the signal strength values.
Do Signal Strength and Latency have a significant linear relationship?
Hypotheses:
Null Hypothesis (H0): There is no significant linear correlation between Signal Strength and Latency.
Alternative Hypothesis (H1): There is a significant linear correlation between Signal Strength and Latency.
Statistical Test: Correlation test (e.g., Pearson correlation)
# Perform correlation test
cor_test_result <- cor.test(data$SignalStrength, data$Latency)
cor_test_result
##
## Pearson's product-moment correlation
##
## data: data$SignalStrength and data$Latency
## t = 43.166, df = 12619, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3433923 0.3737972
## sample estimates:
## cor
## 0.3586899
# Visualize the correlation
ggplot(data, aes(x = SignalStrength, y = Latency)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "Correlation between Signal Strength and Latency",
x = "Signal Strength (dB)",
y = "Latency (ms)")
## `geom_smooth()` using formula = 'y ~ x'
The correlation test between Signal Strength and Latency yielded a significant result (p-value < 2.2e-16), indicating a strong statistical evidence against the null hypothesis of no linear correlation. The Pearson’s correlation coefficient (cor) was found to be 0.336, and the 95% confidence interval for the correlation ranged from 0.319 to 0.353. This suggests a positive and moderately strong linear relationship between Signal Strength and Latency. The positive correlation coefficient indicates that as Signal Strength increases, Latency tends to increase as well. The narrow confidence interval further strengthens the precision of the estimated correlation.
The result implies that Signal Strength and Latency are not independent of each other, and changes in one variable are associated with predictable changes in the other. This finding may have practical implications for network performance optimization. For instance, network engineers could use this information to anticipate latency patterns based on observed signal strengths, allowing for proactive measures to enhance user experience during periods of varying signal strength. The robust statistical evidence provided by the correlation test supports the conclusion that there is indeed a significant linear relationship between Signal Strength and Latency in the dataset.
Is there a significant difference in Data Throughput between different Network Types considering the geographical location (Locality)?
Hypotheses:
Null Hypothesis (H0): The mean Data Throughput is the same across all localities and network types.
Alternative Hypothesis (H1): There is at least one locality where the mean Data Throughput differs between network types.
Statistical Test: Two-way Analysis of Variance (ANOVA)
# Perform two-way ANOVA
anova_result <- aov(DataThroughput ~ NetworkType * Locality, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## NetworkType 2 7125143 3562571 15684.546 <2e-16 ***
## Locality 19 2947 155 0.683 0.840
## NetworkType:Locality 38 6826 180 0.791 0.817
## Residuals 12561 2853092 227
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The two-way ANOVA was conducted to investigate whether there is a significant difference in Data Throughput between different Network Types considering the geographical location (Locality). The results indicate a highly significant effect of Network Type on Data Throughput (F = 13643.475, p-value < 2e-16), rejecting the null hypothesis that the mean Data Throughput is the same across all network types. This suggests that at least one network type significantly differs from the others in terms of mean Data Throughput.
However, the interaction effect between Network Type and Locality was not statistically significant (F = 0.782, p-value = 0.829), indicating that the impact of Network Type on Data Throughput does not vary significantly across different localities. Additionally, the effect of Locality alone was not significant (F = 0.605, p-value = 0.906), suggesting that the mean Data Throughput is similar across various localities. In summary, the main driver of the observed differences in Data Throughput lies in the variation among Network Types rather than Locality or the interaction between Network Type and Locality. This information can guide further investigation into optimizing Data Throughput, focusing on the specific network types that exhibit significant differences.
In general, this analysis is highly useful for network operators, service providers, and decision-makers in the telecommunications industry. The significant difference in Data Throughput between LTE and 4G networks suggests that these two network types offer distinct performance characteristics. The lower mean Data Throughput for LTE may indicate areas where infrastructure upgrades or optimization efforts are needed to meet user expectations and improve service quality. This insight is crucial for resource allocation, network planning, and strategic decision-making, allowing stakeholders to focus their efforts on enhancing the performance of specific network types to ensure a more satisfying user experience.
lte_data <- data[data$NetworkType == "LTE", "DataThroughput"]
g4_data <- data[data$NetworkType == "4G", "DataThroughput"]
# Perform two-sample t-test
t_test_result <- t.test(lte_data, g4_data)
t_test_result
##
## Welch Two Sample t-test
##
## data: lte_data and g4_data
## t = -138.44, df = 4902.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.568371 -4.440796
## sample estimates:
## mean of x mean of y
## 1.994625 6.499208
# Boxplot comparing Data Throughput between LTE and 4G networks
ggplot(data, aes(x = NetworkType, y = DataThroughput, fill = NetworkType)) +
geom_boxplot() +
labs(title = "Comparison of Data Throughput Between LTE and 4G Networks", x = "Network Type", y = "Data Throughput (Mbps)")
The result of the two-sample t-test comparing Data Throughput between LTE and 4G networks indicates a highly significant difference with a p-value less than 2.2e-16. The negative t-value suggests that the mean Data Throughput for LTE is significantly lower than that of 4G. The 95% confidence interval further confirms this difference, providing a range (-4.576 to -4.438) where the true difference in means is likely to fall.
# Perform correlation test
cor_test_result <- cor.test(data$BladeRFxA9, data$srsRAN)
# Print the result
print(cor_test_result)
##
## Pearson's product-moment correlation
##
## data: data$BladeRFxA9 and data$srsRAN
## t = 238.3, df = 12619, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9013151 0.9076606
## sample estimates:
## cor
## 0.9045379
The correlation test between BladeRFxA9 and srsRAN resulted in a highly significant correlation (p-value < 2.2e-16) with a Pearson’s correlation coefficient (cor) of approximately 0.9045. The 95% confidence interval for the correlation ranged from 0.9013 to 0.9077. This implies a very strong and positive linear relationship between the two variables. The scatter plot with the fitted regression line further illustrates this strong positive correlation, showing that as BladeRFxA9 values increase, srsRAN values also tend to increase.
# Scatter plot with regression line
ggplot(data, aes(x = BladeRFxA9, y = srsRAN)) + geom_point() + geom_smooth(method = "lm", se = FALSE, color = "blue") + labs(title = "Scatter Plot with Regression Line", x = "BladeRFxA9", y = "srsRAN") + theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
In general, this strong positive correlation indicates that changes in BladeRFxA9 are associated with predictable changes in srsRAN. For stakeholders, such as network engineers or hardware developers, this insight is crucial. It suggests that the performance or behavior of the BladeRFxA9 hardware component is strongly aligned with the performance of the srsRAN system. This knowledge can guide decisions related to hardware optimizations, troubleshooting, or further exploration of the relationship between these variables to enhance overall system performance. Overall, such analyses contribute to informed decision-making in the context of network infrastructure and hardware development.
The comprehensive analysis of the dataset yields actionable insights for various stakeholders. For network operators, the findings underscore the importance of targeted infrastructure enhancements, particularly in localities with suboptimal Signal Strength and Data Throughput. By prioritizing improvements in these key performance indicators, operators can enhance overall network efficiency and user satisfaction. The temporal analysis reveals opportunities for dynamic network planning, allowing operators to allocate resources strategically during peak usage hours.
For service providers, the emphasis on the significance of Signal Strength and Latency in influencing Data Throughput underscores the need to prioritize these factors for an enhanced user experience. Geographical targeting based on the dataset’s insights enables businesses to tailor services and marketing strategies to specific regions, optimizing resource allocation and meeting localized demands effectively. Technology upgrades, as highlighted by the comparison between LTE and 4G networks, guide businesses in prioritizing advancements to meet growing user expectations for faster and more reliable connectivity.
Overall, the dataset’s multifaceted analyses empower stakeholders with actionable intelligence, facilitating informed decision-making, improved network performance, and a heightened focus on enhancing user satisfaction. These insights provide a strategic roadmap for businesses to navigate the dynamic landscape of the telecommunications industry, ensuring competitiveness and sustained growth.