Smart Campus Efficiency and Resource Optimization System: A Data-Driven Analyszis of Classroom Utilization, Energy
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.5.3
## corrplot 0.95 loaded
library(moments)
campus<-read.csv("advanced_smart_campus_7500_rows.csv")
str(campus)
## 'data.frame': 7500 obs. of 14 variables:
## $ Room_ID : chr "R1" "R2" "R3" "R4" ...
## $ Capacity : int 139 96 122 128 132 47 113 136 87 116 ...
## $ Students_Used: int 139 23 39 13 132 47 28 90 87 110 ...
## $ Electricity : int 11 30 12 16 18 12 47 54 61 51 ...
## $ Time_Slot : chr "Afternoon" "Morning" "Afternoon" "Evening" ...
## $ Date : chr "2024-01-01 00:00:00" "2024-01-01 01:00:00" "2024-01-01 02:00:00" "2024-01-01 03:00:00" ...
## $ Department : chr "CE" "CS" "CS" "ME" ...
## $ Floor : int 1 1 5 4 3 3 2 7 7 6 ...
## $ Utilization : num 100 24 32 10.2 100 ...
## $ Efficiency : num 12.636 0.767 3.25 0.812 7.333 ...
## $ Day : chr "2024-01-01" "2024-01-01" "2024-01-01" "2024-01-01" ...
## $ Hour : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Is_Weekend : chr "False" "False" "False" "False" ...
## $ Load_Level : chr "High" "Low" "Low" "Low" ...
dim(campus)
## [1] 7500 14
head(campus)
## Room_ID Capacity Students_Used Electricity Time_Slot Date
## 1 R1 139 139 11 Afternoon 2024-01-01 00:00:00
## 2 R2 96 23 30 Morning 2024-01-01 01:00:00
## 3 R3 122 39 12 Afternoon 2024-01-01 02:00:00
## 4 R4 128 13 16 Evening 2024-01-01 03:00:00
## 5 R5 132 132 18 Morning 2024-01-01 04:00:00
## 6 R6 47 47 12 Afternoon 2024-01-01 05:00:00
## Department Floor Utilization Efficiency Day Hour Is_Weekend Load_Level
## 1 CE 1 100.00000 12.6363636 2024-01-01 0 False High
## 2 CS 1 23.95833 0.7666667 2024-01-01 1 False Low
## 3 CS 5 31.96721 3.2500000 2024-01-01 2 False Low
## 4 ME 4 10.15625 0.8125000 2024-01-01 3 False Low
## 5 CE 3 100.00000 7.3333333 2024-01-01 4 False High
## 6 ECE 3 100.00000 3.9166667 2024-01-01 5 False High
Answer: The data set contains multiple observations related to campus operations including energy usage, attendance, and facilities.
colSums(is.na(campus))
## Room_ID Capacity Students_Used Electricity Time_Slot
## 0 0 0 0 0
## Date Department Floor Utilization Efficiency
## 0 0 0 0 0
## Day Hour Is_Weekend Load_Level
## 0 0 0 0
sum(is.na(campus))
## [1] 0
Answer: All values are 0 → no missing values Data set is complete and clean
mean(campus$Utilization, na.rm = TRUE)
## [1] 74.32348
median(campus$Utilization, na.rm = TRUE)
## [1] 88.80215
Answer: Mean gives the average utilization, while median provides the middle value, which is less affected by outlier. Missing values were ignored using na.rm = TRUE. Shows overall efficiency of room usage.
range(campus$Capacity, na.rm = TRUE)
## [1] 30 149
range(campus$Electricity, na.rm = TRUE)
## [1] 5 79
Answer: Range shows the lowest and highest values, indicating the spread of data. Useful for understanding data spread
sort(table(campus$Department), decreasing = TRUE)
##
## IT ME ECE CE BBA CS
## 1322 1293 1259 1226 1212 1188
Answer: The frequency of each department was calculated using table(), and the results were sorted in descending order to identify which departments have the highest and lowest number of records.
campus <- read.csv("advanced_smart_campus_7500_rows.csv")
sum(duplicated(campus))
## [1] 0
campus <- campus[!duplicated(campus), ]
dim(campus)
## [1] 7500 14
Answer: The data set was checked for duplicate records using the duplicated() function. The result shows that there are no duplicate rows in the data set. After applying duplicate removal, the number of rows remains unchanged, confirming that the data set is already clean.
library(dplyr)
campus %>%
arrange(desc(Utilization)) %>%
select(Room_ID, Department, Utilization, Efficiency) %>%
head(10)
## Room_ID Department Utilization Efficiency
## 1 R1 CE 100 12.636364
## 2 R5 CE 100 7.333333
## 3 R6 ECE 100 3.916667
## 4 R9 BBA 100 1.426230
## 5 R11 ME 100 10.583333
## 6 R17 CE 100 1.381818
## 7 R21 IT 100 2.511111
## 8 R26 BBA 100 1.256410
## 9 R27 IT 100 1.215385
## 10 R28 CS 100 1.734694
Answer: This identifies the top 10 rooms with highest utilization. These rooms are being used efficiently and indicate high demand areas in the campus.
###Question 2.3: Identify underutilized rooms
campus %>%
filter(Utilization < 0.3) %>%
select(Room_ID, Capacity, Students_Used, Utilization)
## [1] Room_ID Capacity Students_Used Utilization
## <0 rows> (or 0-length row.names)
range(campus$Utilization)
## [1] 6.756757 100.000000
Answer: The analysis identifies rooms with utilization below 30%, showing that some classrooms are underused. Overall, the results suggest uneven classroom usage and a need for better space management. Helps identify wastage of resources. Useful for improving scheduling and allocation.
campus %>%
group_by(Department) %>%
summarise(avg_util = mean(Utilization, na.rm = TRUE)) %>%
arrange(desc(avg_util))
## # A tibble: 6 × 2
## Department avg_util
## <chr> <dbl>
## 1 ME 75.2
## 2 ECE 74.9
## 3 CS 74.8
## 4 BBA 74.7
## 5 CE 73.6
## 6 IT 72.7
Answer: This analysis shows the average classroom utilization for each department. Departments with higher average utilization are using their rooms more efficiently. The departments are arranged from highest to lowest utilization for easy comparison. ### Question 2.5: Average electricity usage by floor
campus %>%
group_by(Floor) %>%
summarise(avg_electricity = mean(Electricity, na.rm = TRUE))
## # A tibble: 7 × 2
## Floor avg_electricity
## <int> <dbl>
## 1 1 41.5
## 2 2 43.0
## 3 3 42.0
## 4 4 42.5
## 5 5 42.2
## 6 6 41.9
## 7 7 42.4
Answer: This analysis calculates the average electricity consumption for each floor. It helps compare which floor uses more or less electricity. The results can help in monitoring energy usage and improving power management.
campus$Utilization_Level <- ifelse(campus$Utilization > 70, "High",
ifelse(campus$Utilization > 40, "Medium","Low"))
Answer: Categorizes rooms based on usage levels.
table(campus$Utilization_Level)
##
## High Low Medium
## 4621 1396 1483
Answer: Shows distribution of High, Medium, Low usage. Categorization helps in better decision-making and comparison across different levels.
campus %>%
group_by(Department) %>%
summarise(avg_eff = mean(Efficiency, na.rm = TRUE)) %>%
arrange(desc(avg_eff))
## # A tibble: 6 × 2
## Department avg_eff
## <chr> <dbl>
## 1 ME 2.48
## 2 CS 2.48
## 3 CE 2.39
## 4 BBA 2.36
## 5 IT 2.32
## 6 ECE 2.29
Answer: This analysis calculates the average efficiency of each department. Departments with higher average efficiency are performing better in resource usage. The results are arranged from highest to lowest efficiency for comparison.
library(ggplot2)
ggplot(campus, aes(x = Utilization)) +
geom_histogram(bins = 30) +
labs(title = "Utilization Distribution",
x = "Utilization",
y = "Count")
Answer: A histogram was plotted to understand the distribution of utilization values. I Most bars are on the higher side → classrooms are well utilized Few bars on the lower side → some underutilized rooms
ggplot(campus, aes(x = Department, y = Electricity)) +
geom_boxplot() +
labs(title = "Electricity Usage by Department",
x = "Department",
y = "Electricity")
Answer: This boxplot compares electricity usage across different departments. It shows the distribution, median, and variation of electricity consumption in each department. Departments with higher or wider boxplots indicate greater electricity usage or variability.
ggplot(campus, aes(x = Capacity, y = Students_Used)) +
geom_point() +
labs(title = "Capacity vs Students Used",
x = "Capacity",
y = "Students Used")
Answer: This scatter plot shows the relationship between room capacity and the number of students using the rooms. Points closer to a straight upward pattern indicate a positive relationship between capacity and student usage.
ggplot(campus, aes(x = Load_Level)) +
geom_bar() +
labs(title = "Load Level Distribution",
x = "Load Level",
y = "Count")
Answer: This bar chart shows the distribution of different load levels in the campus dataset. It displays how many rooms or areas fall under each load level category. The chart helps identify which load level is most common on the campus.
ggplot(campus, aes(x = Department, y = Utilization)) +
geom_bar(stat = "summary", fun = "mean") +
theme_minimal()
Answer:
A bar chart was used to compare the average utilization across different departments. The mean function was applied to calculate the average utilization for each department
Q1 <- quantile(campus$Utilization, 0.25)
Q3 <- quantile(campus$Utilization, 0.75)
IQR_value <- Q3 - Q1
lower <- Q1 - 1.5 * IQR_value
upper <- Q3 + 1.5 * IQR_value
campus$Utilization[campus$Utilization < lower | campus$Utilization > upper]
## numeric(0)
Answer IQR shows the spread of middle 50% data and helps identify variability. ### Question 5.2: Detect Outliers using Z-score Method
z_scores <- (campus$Utilization - mean(campus$Utilization, na.rm = TRUE)) /
sd(campus$Utilization, na.rm = TRUE)
outliers <- campus$Utilization[abs(z_scores) > 3]
outliers
## numeric(0)
Answer: Outliers were detected using the Z-score method. Values with Z-score greater than ±3 were considered extreme and identified as outliers.
library(dplyr)
campus %>%
group_by(Department) %>%
summarise(
avg_util = mean(Utilization),
avg_electricity = mean(Electricity)
)
## # A tibble: 6 × 3
## Department avg_util avg_electricity
## <chr> <dbl> <dbl>
## 1 BBA 74.7 42.4
## 2 CE 73.6 42.7
## 3 CS 74.8 41.7
## 4 ECE 74.9 42.5
## 5 IT 72.7 42.6
## 6 ME 75.2 41.6
library(moments)
skewness(campus$Utilization)
## [1] -0.7341737
Answer: Skewness was calculated to understand the shape of the distribution. A negative value indicates that the data is left-skewed, meaning most values are concentrated on the higher side.
ggplot(campus, aes(x = Utilization)) +
geom_density(fill = "green") +
labs(title = "Density Plot of Utilization") +
theme_minimal()
Answer: Skewness was calculated to understand the shape of the distribution. A negative value indicates that the data is l left-skewed, meaning most values are concentrated on the higher side.
numeric_data <- campus[, c("Utilization",
"Electricity",
"Students_Used",
"Capacity",
"Efficiency")]
cor_matrix <- cor(numeric_data)
print(cor_matrix)
## Utilization Electricity Students_Used Capacity Efficiency
## Utilization 1.000000000 -0.003481175 0.56978980 -0.39654103 0.2534893
## Electricity -0.003481175 1.000000000 0.01315685 0.01502653 -0.6156781
## Students_Used 0.569789799 0.013156852 1.00000000 0.47945998 0.4199094
## Capacity -0.396541029 0.015026529 0.47945998 1.00000000 0.1858072
## Efficiency 0.253489302 -0.615678100 0.41990943 0.18580717 1.0000000
Answer: The correlation matrix helps identify relationships between numerical variables in the smart campus dataset. A positive correlation between utilization and electricity usage indicates that rooms with higher utilization generally consume more electricity. Similarly, the number of students using a room also contributes to increased electricity consumption. Capacity shows how room size may influence utilization and resource usage.
library(corrplot)
corrplot(cor_matrix,
method = "color",
type = "upper",
tl.col = "black",
tl.srt = 45)
Answer: The heatmap visually represents the strength of
relationships between variables. Darker or stronger colors indicate
higher correlation values. The visualization makes it easier to quickly
identify which factors are strongly associated with electricity usage
and resource utilization in the campus.
Answer: Interpretation: Positive correlation indicates that when utilization increases, electricity usage also tends to increase. The correlation analysis suggests that electricity usage increases as utilization and student occupancy increase. This indicates that highly occupied campus facilities consume more resources. Such findings can help administrators optimize room allocation and energy management strategies to reduce unnecessary power consumption.
set.seed(123)
sample_rows <- sample(1:nrow(campus),
0.8 * nrow(campus))
train_data <- campus[sample_rows, ]
test_data <- campus[-sample_rows, ]
Answer: A regression model is created to predict electricity usage based on utilization and the number of students using the facility. The dataset is divided into training and testing data to evaluate the model properly. This helps simulate real-world prediction scenarios and improves the reliability of the analysis.
model <- lm(Electricity ~ Utilization + Students_Used + Capacity,
data = train_data)
summary(model)
##
## Call:
## lm(formula = Electricity ~ Utilization + Students_Used + Capacity,
## data = train_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.012 -18.784 0.477 18.347 37.358
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.43624 2.70390 16.434 <2e-16 ***
## Utilization -0.02989 0.03083 -0.970 0.332
## Students_Used 0.03133 0.03010 1.041 0.298
## Capacity -0.02046 0.02481 -0.825 0.410
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.51 on 5996 degrees of freedom
## Multiple R-squared: 0.000214, Adjusted R-squared: -0.0002862
## F-statistic: 0.4278 on 3 and 5996 DF, p-value: 0.7331
Answer: The linear regression model establishes a mathematical relationship between electricity usage and the predictor variables. The model summary provides coefficients, significance levels, and statistical measures that help determine how strongly utilization and student count affect electricity consumption.
predictions <- predict(model,
newdata = test_data)
results <- data.frame(
Actual = test_data$Electricity,
Predicted = predictions
)
head(results)
## Actual Predicted
## 4 16 41.92116
## 10 51 42.67437
## 11 12 42.82728
## 12 76 42.05610
## 14 37 42.55362
## 15 62 42.47190
Answer: The prediction step uses the trained regression model to estimate electricity usage values for the testing dataset. These predicted values are compared with the actual electricity usage values to evaluate model performance.
library(ggplot2)
ggplot(results,
aes(x = Actual,
y = Predicted)) +
geom_point(color = "blue",
size = 3) +
geom_abline(slope = 1,
intercept = 0,
color = "red") +
labs(title = "Actual vs Predicted Electricity Consumption",
x = "Actual Values",
y = "Predicted Values") +
theme_minimal()
Answer: The scatter plot compares actual electricity
usage with predicted values generated by the regression model. Points
closer to the red reference line indicate more accurate predictions.
This visualization helps assess how well the model performs.
rmse <- sqrt(mean((results$Actual -
results$Predicted)^2))
print(rmse)
## [1] 21.66092
Answer: The RMSE (Root Mean Square Error) value measures prediction error in the regression model. A lower RMSE value indicates better prediction accuracy, meaning the model’s predicted electricity usage values are closer to the actual observed values.
summary(model)$r.squared
## [1] 0.0002139938
Answer: The R-squared value indicates how much variability in electricity usage is explained by utilization and student occupancy. A higher R-squared value suggests that the model successfully captures the relationship between the variables.
Answer: Higher utilization and more students generally increase electricity consumption. R-squared explains how much variability in electricity usage is explained by the predictors. The regression analysis shows that utilization and student occupancy significantly influence electricity usage in campus facilities. As room usage increases, electricity consumption also rises. These insights can help improve campus energy optimization and smart resource management systems.
dept_count <- table(campus$Department)
pie(dept_count,
main = "Department Distribution")
Answer: The pie chart represents the proportion of
different departments present in the smart campus dataset. It helps
visualize how campus resources are distributed among various departments
and identifies departments with higher facility usage.
plot(campus$Electricity,
type = "l",
col = "blue",
main = "Electricity Consumption Trend",
xlab = "Observations",
ylab = "Electricity")
Answer: The line graph shows the variation in
electricity usage across different campus rooms or facilities. It helps
identify increasing or decreasing trends in energy consumption and
highlights areas with unusually high electricity usage.
Answer: Key Findings: 1. Higher utilization leads to increased electricity usage. 2. Some departments consume significantly more resources. 3. Outlier rooms indicate inefficient energy management. 4. Regression analysis successfully predicts electricity usage. 5. Correlation analysis shows strong relationships among variables.
Answer: Future Improvements: 1. Integrate real-time IoT sensor data. 2. Build a dashboard using Shiny. 3. Add machine learning models for advanced prediction. 4. Monitor classroom energy efficiency in real time. 5. Automate smart resource allocation.