BY SHITTU OLALEKAN
IAQ stands for Indoor Air Quality. It refers to the quality of the air within and around buildings, particularly as it relates to the health and comfort of occupants. IAQ is influenced by various factors, including levels of indoor pollutants like carbon dioxide (CO₂), volatile organic compounds (VOCs), particulate matter, temperature, and humidity. Good IAQ is essential in workplaces and homes, as poor IAQ can lead to health issues, reduced productivity, and discomfort for those within the space.
To determine the levels CO₂, and TVOCs in the offices.
To assess the effects of the introduction of indoor plants in reducing the level of pollution of indoor air.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(ggplot2)
library(stats)
library(tidyr)
library(dplyr)
# LOAD THE DATASET
sheet1 <- read_excel('My Data Analysis Table.xlsx', sheet = 'Sheet1')
## New names:
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
sheet2 <- read_excel('My Data Analysis Table.xlsx', sheet = 'Sheet2')
## New names:
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
str(data)
## function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"),
## envir = .GlobalEnv, overwrite = TRUE)
head(data)
##
## 1 function (..., list = character(), package = NULL, lib.loc = NULL,
## 2 verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
## 3 {
## 4 fileExt <- function(x) {
## 5 db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)
## 6 ans <- sub(".*\\\\.", "", x)
# Read the data
df <- read_excel("CO2_TVOC_DATA.xlsx")
# Create box plots for CO2 and TVOC
p1 <- ggplot(df, aes(x=`INDOOR PLANT`, y=CO2_MEAN, fill=TIME)) +
geom_boxplot() +
theme_minimal() +
labs(title="CO2 Levels by Plant Presence and Time",
y="CO2 (ppm)",
x="Indoor Plant Status")
p2 <- ggplot(df, aes(x=`INDOOR PLANT`, y=TVOC_MEAN, fill=TIME)) +
geom_boxplot() +
theme_minimal() +
labs(title="TVOC Levels by Plant Presence and Time",
y="TVOC (ppb)",
x="Indoor Plant Status")
print(p1)
print(p2)
# Ensure 'TIME' is in the correct order if it's a categorical variable (e.g., ordered factor)
df$TIME <- factor(df$TIME, ordered = TRUE)
# Create scatter plot for CO2 levels
p1 <- ggplot(df, aes(x=TIME, y=CO2_MEAN, color=`INDOOR PLANT`)) +
geom_point(size=3, alpha=0.7, position=position_jitter(width=0.1)) +
theme_minimal() +
labs(title="CO2 Levels Over Time by Plant Presence",
y="CO2 (ppm)",
x="Time",
color="Indoor Plant Status") +
theme(legend.position="top")
# Create scatter plot for TVOC levels
p2 <- ggplot(df, aes(x=TIME, y=TVOC_MEAN, color=`INDOOR PLANT`)) +
geom_point(size=3, alpha=0.7, position=position_jitter(width=0.1)) +
theme_minimal() +
labs(title="TVOC Levels Over Time by Plant Presence",
y="TVOC (ppb)",
x="Time",
color="Indoor Plant Status") +
theme(legend.position="top")
# Print the charts
print(p1)
print(p2)
# Scatter plot to show correlation between CO2 and TVOC levels
p <- ggplot(df, aes(x=CO2_MEAN, y=TVOC_MEAN, color=`INDOOR PLANT`, shape=TIME)) +
geom_point(size=3, alpha=0.7) +
theme_minimal() +
labs(title="Correlation Between CO2 and TVOC Levels",
x="CO2 (ppm)",
y="TVOC (ppb)",
color="Indoor Plant Status",
shape="Time") +
theme(legend.position="top")
# Print the scatter plot
print(p)
## Warning: Using shapes for an ordinal variable is not advised
This scatter plot provides a visual way to interpret the relationship between CO₂ and TVOC levels, and it also helps to explore how indoor plants and time might influence this relationship. Clear patterns or separations could support the hypothesis that indoor plants impact air quality, while mixed or overlapping patterns may suggest more nuanced or time-dependent effects.
# Shapiro-Wilk normality tests
cat("\
Normality Tests:\
")
##
## Normality Tests:
cat("\
CO2 Normality Test:\
")
##
## CO2 Normality Test:
print(shapiro.test(df$CO2_MEAN))
##
## Shapiro-Wilk normality test
##
## data: df$CO2_MEAN
## W = 0.96441, p-value = 0.2494
cat("\
TVOC Normality Test:\
")
##
## TVOC Normality Test:
print(shapiro.test(df$TVOC_MEAN))
##
## Shapiro-Wilk normality test
##
## data: df$TVOC_MEAN
## W = 0.92857, p-value = 0.01617
# Normality Test Visualization
# Load necessary libraries
# Histogram with normal curve overlay for CO2
p1 <- ggplot(df, aes(x=CO2_MEAN)) +
geom_histogram(aes(y=..density..), bins=15, fill="skyblue", color="black", alpha=0.7) +
stat_function(fun=dnorm, args=list(mean=mean(df$CO2_MEAN), sd=sd(df$CO2_MEAN)), color="red", size=1) +
theme_minimal() +
labs(title="Histogram of CO2 Levels with Normal Curve",
x="CO2 (ppm)",
y="Density")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Histogram with normal curve overlay for TVOC
p2 <- ggplot(df, aes(x=TVOC_MEAN)) +
geom_histogram(aes(y=..density..), bins=15, fill="lightgreen", color="black", alpha=0.7) +
stat_function(fun=dnorm, args=list(mean=mean(df$TVOC_MEAN), sd=sd(df$TVOC_MEAN)), color="red", size=1) +
theme_minimal() +
labs(title="Histogram of TVOC Levels with Normal Curve",
x="TVOC (ppb)",
y="Density")
# Q-Q plot for CO2 levels
p3 <- ggplot(df, aes(sample=CO2_MEAN)) +
stat_qq() +
stat_qq_line(color="red") +
theme_minimal() +
labs(title="Q-Q Plot for CO2 Levels",
x="Theoretical Quantiles",
y="Sample Quantiles")
# Q-Q plot for TVOC levels
p4 <- ggplot(df, aes(sample=TVOC_MEAN)) +
stat_qq() +
stat_qq_line(color="red") +
theme_minimal() +
labs(title="Q-Q Plot for TVOC Levels",
x="Theoretical Quantiles",
y="Sample Quantiles")
# Print the charts
print(p1)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(p2)
print(p3)
print(p4)
Perfromed Shapiro-Wilk normality tests to check whether the CO2 and TVOC data follow a normal distribution. The Shapiro-Wilk test tests the null hypothesis that the data is normally distributed. A p-value greater than 0.05 suggests that the data does not significantly differ from a normal distribution, while a p-value less than 0.05 indicates that the data does not follow a normal distribution.
W = 0.96441: The W statistic tells us how closely the data follows a normal distribution. A value closer to 1 indicates better normality.
p-value = 0.2494: The p-value is greater than 0.05, which means that the null hypothesis cannot be rejected, indicating that the CO2 data do follow a normal distribution.
W = 0.92857: The W statistic is lower than for CO2, suggesting that the TVOC data deviate more from normality.
p-value = 0.01617: The p-value is less than 0.05, so the null hypothesis is rejected, meaning that the TVOC datado not follow a normal distribution.
CO2 data appears to follow a normal distribution (p-value > 0.05).
TVOC data does not follow a normal distribution (p-value < 0.05).
Since the CO2 data are normal, parametric tests like the t-test and ANOVA are appropriate for them.
# T-tests
cat("\
T-tests:\
")
##
## T-tests:
co2_ttest <- t.test(CO2_MEAN ~ `INDOOR PLANT`, data=df)
tvoc_ttest <- t.test(TVOC_MEAN ~ `INDOOR PLANT`, data=df)
print(co2_ttest)
##
## Welch Two Sample t-test
##
## data: CO2_MEAN by INDOOR PLANT
## t = -7.003, df = 35.966, p-value = 3.269e-08
## alternative hypothesis: true difference in means between group WITH and group WITHOUT is not equal to 0
## 95 percent confidence interval:
## -130.92613 -72.12129
## sample estimates:
## mean in group WITH mean in group WITHOUT
## 667.4958 769.0195
print(tvoc_ttest)
##
## Welch Two Sample t-test
##
## data: TVOC_MEAN by INDOOR PLANT
## t = -3.663, df = 31.788, p-value = 0.0008994
## alternative hypothesis: true difference in means between group WITH and group WITHOUT is not equal to 0
## 95 percent confidence interval:
## -0.25489384 -0.07268511
## sample estimates:
## mean in group WITH mean in group WITHOUT
## 0.2742105 0.4380000
# T-test VISUALIZATION
# Calculate means and confidence intervals for CO2
co2_summary <- df %>%
group_by(`INDOOR PLANT`) %>%
summarise(
mean_CO2 = mean(CO2_MEAN),
sd_CO2 = sd(CO2_MEAN),
n = n(),
se_CO2 = sd_CO2 / sqrt(n),
ci_lower_CO2 = mean_CO2 - qt(0.975, df=n-1) * se_CO2,
ci_upper_CO2 = mean_CO2 + qt(0.975, df=n-1) * se_CO2
)
# Calculate means and confidence intervals for TVOC
tvoc_summary <- df %>%
group_by(`INDOOR PLANT`) %>%
summarise(
mean_TVOC = mean(TVOC_MEAN),
sd_TVOC = sd(TVOC_MEAN),
n = n(),
se_TVOC = sd_TVOC / sqrt(n),
ci_lower_TVOC = mean_TVOC - qt(0.975, df=n-1) * se_TVOC,
ci_upper_TVOC = mean_TVOC + qt(0.975, df=n-1) * se_TVOC
)
# Plot for CO2 levels with error bars
p1 <- ggplot(co2_summary, aes(x=`INDOOR PLANT`, y=mean_CO2, fill=`INDOOR PLANT`)) +
geom_bar(stat="identity", position=position_dodge(), width=0.6) +
geom_errorbar(aes(ymin=ci_lower_CO2, ymax=ci_upper_CO2), width=0.2, position=position_dodge(0.6)) +
theme_minimal() +
labs(title="CO2 Levels by Indoor Plant Presence",
y="Mean CO2 (ppm) ± 95% CI",
x="Indoor Plant Status") +
theme(legend.position="none")
# Plot for TVOC levels with error bars
p2 <- ggplot(tvoc_summary, aes(x=`INDOOR PLANT`, y=mean_TVOC, fill=`INDOOR PLANT`)) +
geom_bar(stat="identity", position=position_dodge(), width=0.6) +
geom_errorbar(aes(ymin=ci_lower_TVOC, ymax=ci_upper_TVOC), width=0.2, position=position_dodge(0.6)) +
theme_minimal() +
labs(title="TVOC Levels by Indoor Plant Presence",
y="Mean TVOC (ppb) ± 95% CI",
x="Indoor Plant Status") +
theme(legend.position="none")
# Print the charts
print(p1)
print(p2)
t = -7.003: This is the t-statistic. A value of -7.003 indicates a significant difference between the two office (with and without plants). The negative sign suggests that the mean CO2 level in the office with plants is lower than the mean CO2 level in the office without plants.
df = 35.966: The degrees of freedom for the test, which is calculated based on the sample sizes of both offices.
p-value = 3.269e-08: The p-value is extremely small (much smaller than 0.05), indicating that the difference in CO2 means between the two offices is statistically significant.
Alternative hypothesis: The hypothesis being tested is whether the true difference in CO2 means between the office with plants and the office without plants is not equal to zero (i.e., there is a difference).
95% confidence interval: The true difference in means is between -130.93 and -72.12. Since the entire confidence interval is below zero, it confirms that the office with plants has a significantly lower mean CO2 level compared to the office without plants.
Sample estimates:
Mean in office WITH plants: 667.4958
Mean in office WITHOUT plants: 769.0195
Thus, the presence of indoor plants significantly reduces CO2 levels, as the office with plants has lower CO2 levels than the office without.
t = -3.663: The t-statistic is -3.663, which indicates a significant difference in TVOC levels between the two offices. Again, the negative sign suggests the office with plants has a lower mean TVOC level than the office without plants.
df = 31.788: The degrees of freedom for the test.
p-value = 0.0008994: The p-value is less than 0.05, which means the difference in TVOC levels between the two offcies is statistically significant.
Alternative hypothesis: The hypothesis being tested is whether the true difference in TVOC means between the two offices is not zero (i.e., there is a difference).
95% confidence interval: The true difference in means is between -0.255 and -0.073. Since the entire confidence interval is below zero, it confirms that the office with plants has significantly lower TVOC levels than the office without plants.
Sample estimates:
Mean in office WITH plants: 0.2742
Mean in office WITHOUT plants: 0.4380
Thus, indoor plants significantly reduce TVOC levels, as the office with plants has lower TVOC levels than the office without.
CO2:
The presence of indoor plants significantly reduces CO2 levels.
The mean CO2 level in the office with plants (667.5) is significantly lower than the mean CO2 level in the office without plants (769.0).
TVOC:
The presence of indoor plants significantly reduces TVOC levels.
The mean TVOC level in the office with plants (0.274) is significantly lower than the mean TVOC level in the office without plants (0.438).
Both t-tests show that indoor plants have a significant impact in reducing both CO2 and TVOC levels, supporting the idea that plants improve indoor air quality.
# ANOVA
co2_aov <- aov(CO2_MEAN ~ `INDOOR PLANT` * TIME, data=df)
tvoc_aov <- aov(TVOC_MEAN ~ `INDOOR PLANT` * TIME, data=df)
cat("\
CO2 ANOVA Results:\
")
##
## CO2 ANOVA Results:
print(summary(co2_aov))
## Df Sum Sq Mean Sq F value Pr(>F)
## `INDOOR PLANT` 1 100428 100428 54.697 1.19e-08 ***
## TIME 1 8877 8877 4.835 0.0346 *
## `INDOOR PLANT`:TIME 1 2161 2161 1.177 0.2854
## Residuals 35 64263 1836
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("\
TVOC ANOVA Results:\
")
##
## TVOC ANOVA Results:
print(summary(tvoc_aov))
## Df Sum Sq Mean Sq F value Pr(>F)
## `INDOOR PLANT` 1 0.2614 0.26139 13.26 0.000869 ***
## TIME 1 0.0343 0.03430 1.74 0.195755
## `INDOOR PLANT`:TIME 1 0.0138 0.01380 0.70 0.408414
## Residuals 35 0.6901 0.01972
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA TEST VISUALIZATION
# Box plot for CO2 levels with means
p1 <- ggplot(df, aes(x=`INDOOR PLANT`, y=CO2_MEAN, fill=TIME)) +
geom_boxplot(outlier.shape=NA, alpha=0.5) +
stat_summary(fun=mean, geom="point", shape=20, size=3, color="red", position=position_dodge(0.75)) +
theme_minimal() +
labs(title="CO2 Levels by Plant Presence and Time",
x="Indoor Plant Status",
y="CO2 (ppm)") +
theme(legend.position="top")
# Box plot for TVOC levels with means
p2 <- ggplot(df, aes(x=`INDOOR PLANT`, y=TVOC_MEAN, fill=TIME)) +
geom_boxplot(outlier.shape=NA, alpha=0.5) +
stat_summary(fun=mean, geom="point", shape=20, size=3, color="red", position=position_dodge(0.75)) +
theme_minimal() +
labs(title="TVOC Levels by Plant Presence and Time",
x="Indoor Plant Status",
y="TVOC (ppb)") +
theme(legend.position="top")
# Print the box plots
print(p1)
print(p2)
Here’s what each part of the output means for CO2:
Df (Degrees of Freedom): This is the number of categories (levels) for each factor and interaction.
INDOOR PLANT has 1 degree of freedom (2 levels: with
or without plant).
TIME also has 1 degree of freedom (2 levels: before
or after some time).
INDOOR PLANT:TIME interaction has 1 degree of
freedom.
Residuals are the leftover variation that isn’t
explained by the factors.
Sum Sq (Sum of Squares): This is a measure of variation (or difference) that each factor or interaction explains in the data.
INDOOR PLANT explains 100,428 units of
variation in CO2 levels.Mean Sq (Mean Squares): This is the average variation explained by each factor. It’s the Sum of Squares divided by the Degrees of Freedom.
F value: This is a ratio of how much variation is explained by the factor compared to the residual variation. A higher F value means the factor has a bigger effect.
Pr(>F) (p-value): This tells you if the effect is statistically significant. If the p-value is less than 0.05, the result is significant.
For INDOOR PLANT, the p-value is 1.19e-08, which is much smaller than 0.05. This means the indoor plant significantly affects CO2 levels.
For TIME, the p-value is 0.0346, which is less than 0.05, so time also significantly affects CO2.
For the interaction
(INDOOR PLANT:TIME), the p-value is 0.2854, which is
greater than 0.05, so there is no significant
interaction between plant presence and time for CO2.
Here’s the breakdown for TVOC:
INDOOR PLANT: The p-value is
0.000869, which is very small, so the indoor plant significantly
affects TVOClevels.
TIME: The p-value is 0.195755, which is greater than 0.05, meaning TIME does not significantly affect TVOClevels.
INDOOR PLANT:TIME: The p-value is
0.408414, which is also greater than 0.05, meaning there is no
significant interaction between plant presence and time for
TVOC.
ANOVA shows significant effects:
Indoor plants significantly affect both CO2 (p<0.001) and TVOC (p<0.001)
Time of day affects CO2 (p<0.05) but not TVOC
No significant interaction effects
For CO2:
Indoor plants significantly reduce CO2 levels.
Time also affects CO2 levels.
There is no significant interaction between indoor plants and time for CO2 levels.
For TVOC:
Indoor plants significantly reduce TVOC levels.
Time does not significantly affect TVOC levels.
There is no significant interaction between indoor plants and time for TVOC levels.
This means your experiment shows that indoor plants help reduce both CO2 and TVOC levels, but the impact of time is only significant for CO2, not TVOCs.