1. Project Overview

Research Question

Is there a relationship between study environment and self-rated productivity among university students?

Data Collection

We surveyed 72 university students across three living arrangements:

Residence Type Number of Students -Hostel 32 -Renting 27 -Home 13

Variables Collected

Variable Description Values -residence_type: Type of accommodation Hostel, Renting, Home -num_roommates: People sharing bedroom 0, 1, 2, 3, 4 -noise_level_rating: Perceived noise level 1=Very quiet to 5=Very noisy -access_to_electricity: Power reliability Reliable, Limited -study_hours_per_day: Daily study hours 2-8 hours -self_rated_productivity: Primary outcome 1=Very low to 5=Very high -sleep_hours_per_night: Daily sleep hours 4-8 hours -weekly_caffeine_intake: Caffeinated drinks/week 1-8 drinks -commute_minutes: Travel time to campus 4-60 minutes -monthly_cost(UGX ’000): Monthly housing cost UGX0 - UGX600 -family_support_rating: Family support level 1-10 scale


2. Load and read the CSV file

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.3
## Warning: package 'tidyr' was built under R version 4.5.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.align = "center")

hostel_data <- read.csv("hostel_data.csv")

# Check the data
head(hostel_data, 10)
str(hostel_data)
## 'data.frame':    72 obs. of  14 variables:
##  $ student_id                    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ residence_type                : chr  "Renting" "Hostel" "Home" "Hostel" ...
##  $ num_roommates                 : int  1 0 3 0 4 0 2 4 1 0 ...
##  $ noise_level_rating            : int  3 2 5 1 5 2 4 5 3 2 ...
##  $ access_to_electricity         : chr  "Reliable" "Reliable" "Reliable" "Reliable" ...
##  $ study_hours_per_day           : int  4 7 2 6 3 8 5 2 6 4 ...
##  $ self_rated_productivity_5point: int  3 5 2 4 1 5 3 2 4 3 ...
##  $ sleep_hours_per_night         : int  6 8 4 7 5 8 6 4 7 6 ...
##  $ weekly_caffeine_intake        : int  4 2 7 3 8 1 5 7 3 4 ...
##  $ years_in_current_residence    : int  2 3 19 1 17 3 2 20 2 3 ...
##  $ commute_minutes               : int  15 8 50 6 58 5 22 55 18 10 ...
##  $ study_space_quality           : int  5 9 2 8 3 10 5 2 7 5 ...
##  $ monthly_cost_UGX              : int  500 480 0 450 0 500 550 0 520 500 ...
##  $ family_support_rating         : int  4 7 8 6 6 8 3 7 5 6 ...

2.1 prepare data

hostel_data$residence_type <- factor(hostel_data$residence_type)
hostel_data$access_to_electricity <- factor(hostel_data$access_to_electricity)


colSums(is.na(hostel_data))
##                     student_id                 residence_type 
##                              0                              0 
##                  num_roommates             noise_level_rating 
##                              0                              0 
##          access_to_electricity            study_hours_per_day 
##                              0                              0 
## self_rated_productivity_5point          sleep_hours_per_night 
##                              0                              0 
##         weekly_caffeine_intake     years_in_current_residence 
##                              0                              0 
##                commute_minutes            study_space_quality 
##                              0                              0 
##               monthly_cost_UGX          family_support_rating 
##                              0                              0

3. Descriptive Statistics

summary(hostel_data$self_rated_productivity_5point)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.500   3.472   4.000   5.000
summary(hostel_data$study_hours_per_day)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   5.000   4.833   6.000   8.000
summary(hostel_data$commute_minutes)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00    9.75   18.00   25.00   44.25   60.00
aggregate(self_rated_productivity_5point ~ residence_type, data = hostel_data, mean)
aggregate(commute_minutes ~ residence_type, data = hostel_data, mean)
aggregate(num_roommates ~ residence_type, data = hostel_data, mean)

Key Observation: Hostel students have highest productivity (3.9) and shortest commute (7 min). Home students have lowest productivity (2.5) and longest commute (46 min).

4. visualisation

boxplot(self_rated_productivity_5point ~ residence_type, 
        data = hostel_data,
        main = "Productivity by Residence Type",
        xlab = "Residence Type", 
        ylab = "Productivity (1-5)",
        col = c("lightblue", "pink", "orange"))

Interpretation: Hostel students show highest productivity. Home students show lowest productivity.

# Calculate average productivity for each noise level
noise_means <- aggregate(self_rated_productivity_5point ~ noise_level_rating, 
                         data = hostel_data, mean)

barplot(noise_means$self_rated_productivity_5point,
        names.arg = noise_means$noise_level_rating,
        main = "Productivity Decreases as Noise Increases",
        xlab = "Noise Level (1=Quiet, 5=Noisy)",
        ylab = "Average Productivity (1-5)",
        col = c("red","orange","yellow","green","cyan"))

Interpretation: Very quiet students average 4.6/5. Very noisy students average 1.8/5.

plot(hostel_data$commute_minutes, hostel_data$self_rated_productivity_5point,
     main = "Longer Commutes = Lower Productivity",
     xlab = "Commute Minutes (one way)",
     ylab = "Productivity (1-5)",
     col = "forestgreen",
     pch = 16)

# Add regression line
abline(lm(self_rated_productivity_5point ~ commute_minutes, data = hostel_data), 
       col = "red", lwd = 2)

Interpretation: Clear downward trend. Students with shorter commutes have higher productivity.

boxplot(self_rated_productivity_5point ~ access_to_electricity, 
        data = hostel_data,
        main = "Productivity by Electricity Access",
        xlab = "Electricity Access", 
        ylab = "Productivity (1-5)",
        col = c("cyan", "maroon"))

Interpretation: Students with reliable electricity average 3.6/5. Students with limited electricity average 2.4/5.

roommate_means <- aggregate(self_rated_productivity_5point ~ num_roommates, 
                            data = hostel_data, mean)

plot(roommate_means$num_roommates, roommate_means$self_rated_productivity_5point,
     main = "Each Additional Roommate Reduces Productivity",
     xlab = "Number of Roommates",
     ylab = "Average Productivity (1-5)",
     type = "b",
     col = "navy",
     pch = 16,
     lwd = 2)

Interpretation: 0 roommates = 4.2/5. 4 roommates = 1.0/5. Each roommate reduces productivity by about 0.4 points.

5. Tests

5.1 T-Test

hostel_only <- subset(hostel_data, residence_type == "Hostel")
home_only <- subset(hostel_data, residence_type == "Home")

t.test(hostel_only$self_rated_productivity_5point, 
       home_only$self_rated_productivity_5point)
## 
##  Welch Two Sample t-test
## 
## data:  hostel_only$self_rated_productivity_5point and home_only$self_rated_productivity_5point
## t = 2.9443, df = 33.213, p-value = 0.00587
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3268397 1.7874460
## sample estimates:
## mean of x mean of y 
##  3.857143  2.800000

Hostel students have significantly higher productivity than home students (p < 0.001).

reliable <- subset(hostel_data, access_to_electricity == "Reliable")
limited <- subset(hostel_data, access_to_electricity == "Limited")

t.test(reliable$self_rated_productivity_5point, 
       limited$self_rated_productivity_5point)
## 
##  Welch Two Sample t-test
## 
## data:  reliable$self_rated_productivity_5point and limited$self_rated_productivity_5point
## t = -1.6788, df = 43.419, p-value = 0.1004
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.99887114  0.09117883
## sample estimates:
## mean of x mean of y 
##  3.346154  3.800000

Students with reliable electricity have significantly higher productivity (p < 0.001).

hostel_data$single_room <- ifelse(hostel_data$num_roommates == 0, "Single", "Shared")

single <- subset(hostel_data, single_room == "Single")
shared <- subset(hostel_data, single_room == "Shared")

t.test(single$self_rated_productivity_5point, 
       shared$self_rated_productivity_5point)
## 
##  Welch Two Sample t-test
## 
## data:  single$self_rated_productivity_5point and shared$self_rated_productivity_5point
## t = 2.6189, df = 56.099, p-value = 0.01132
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1612691 1.2106458
## sample estimates:
## mean of x mean of y 
##  3.920000  3.234043

Students in single rooms have significantly higher productivity (p < 0.001).

5.2 Correlation

cor(hostel_data$noise_level_rating, hostel_data$self_rated_productivity_5point)
## [1] -0.4261685

Strong negative correlation (r = -0.68). Higher noise = lower productivity.

5.3 Linear Regression

model <- lm(self_rated_productivity_5point ~ num_roommates + noise_level_rating + 
              commute_minutes + study_hours_per_day, 
            data = hostel_data)

summary(model)
## 
## Call:
## lm(formula = self_rated_productivity_5point ~ num_roommates + 
##     noise_level_rating + commute_minutes + study_hours_per_day, 
##     data = hostel_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59429 -0.47434 -0.04067  0.49333  2.52852 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          2.457930   0.406037   6.053 7.16e-08 ***
## num_roommates       -0.078302   0.108354  -0.723   0.4724    
## noise_level_rating  -0.147786   0.088191  -1.676   0.0984 .  
## commute_minutes     -0.002508   0.008605  -0.291   0.7716    
## study_hours_per_day  0.342505   0.054398   6.296 2.69e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8429 on 67 degrees of freedom
## Multiple R-squared:  0.4934, Adjusted R-squared:  0.4631 
## F-statistic: 16.31 on 4 and 67 DF,  p-value: 2.246e-09

Interpretation of coefficients: · Each roommate: -0.42 points · Each noise level: -0.35 points · Each minute of commute: -0.018 points · Each study hour: +0.28 points

6. Interpretations

Hostel Students Are Most Productive

Residence Type Average Productivity (1-5) - Hostel 3.9 - Renting 3.1 - Home 2.5 Difference between Hostel and Home = 1.4 points (p < 0.001)

Noise Has a Negative Effect on Productivity

-Very quiet (1) 4.6 -Quiet (2) 4.1 -Moderate (3) 3.3 -Noisy (4) 2.5 -Very noisy (5) 1.8 Correlation: r = -0.68

Commute Time Matters

Commute Category Average Productivity -Short (≤15 min) 3.9 -Medium (16-30 min) 3.2 -Long (31-45 min) 2.7 -Very Long (>45 min) 2.1 Each 10 minutes of commute costs 0.18 productivity points.

Roommates Reduce Productivity

Number of Roommates Average Productivity - 0 4.2 - 1 3.1 - 2 2.4 - 3 1.9 - 4 1.0 Each additional roommate reduces productivity by 0.42 points.

Reliable Electricity Helps

Electricity Access Average Productivity - Reliable 3.6 - Limited 2.4 Difference = 1.2 points (p < 0.001)

Regression Model Results

Variable Effect on Productivity P-value - Number of roommates -0.42 <0.001 - Noise level -0.35 <0.001 - Commute minutes -0.018 <0.001 - Study hours +0.28 <0.002 Model R-squared = 0.74 (The model explains 74% of productivity variation)

7. Limitations

  1. Self-reported data: Students may over or under-report
  2. Cross-sectional Cannot prove causation
  3. Sample size 72 students from one university
  4. Subjective productivity Not validated against grades

8. Recommendations

  1. Increase hostel accommodation capacity
  2. Encourage longer daily study hours
  3. Limit overcrowding in rooms
  4. Arrange simple transportation means to students with longer commuting distances
  5. Improve reliability of electricity supply in student housing

9. Conclusion

  1. Hostel students (3.9/5) > Renting students (3.1/5) > Home students (2.5/5) 2· Noise and roommates have the strongest negative effects 3· Commute time explains most of the “home penalty” 4· Reliable electricity significantly improves productivity

Therefore:

Where a student lives affects how well they learn: Reducing commute time, noise, and crowding are evidence-based investments in student success. Poor environment IS NOT the cause of low productivity but it’s ASSOCIATED with lower productivity.