Code
library(tidyverse)
data = read.csv("C:\\Users\\oswat\\OneDrive\\Desktop\\DATA1001\\data1001_survey_data_2025_S1.csv")library(tidyverse)
data = read.csv("C:\\Users\\oswat\\OneDrive\\Desktop\\DATA1001\\data1001_survey_data_2025_S1.csv")Our report explores the relationships between student’s rent, commute, and residency status. Graphical and numerical analysis show domestic students commute longer hours compared to international students. Linear modelling revealed a weak negative correlation between rent prices and commute time. Further data is needed to assess student accessibility to campus.
The data was derived from a survey completed by 2145 students (2103 consented) from 2024 semester 2 and 2025 semester 1, assessing 28 variables. Our analysis focuses on how commute is influenced by rent and student type (domestic & international). We chose rent and commute because we believe they were measures that students would know off the top of their head and would be able to report accurately, unlike variables like expected/aimed mark and number of hours studied which can be influenced by social desirability bias.
Limitations:
Some of the limitations of the data arise from selection and consent bias, which omits responses from DATA1001 students who did not participate (or consent) in the survey. In addition, the data only reflects the sample taken from data1001/data1901 students and not the population of Sydney university students.There is also a response bias where questions could have been misinterpreted or answered dishonestly.
Assumptions:
We made the assumption of independence, that responses to the survey were not influenced by other people’s responses, however it is possible that people doing the survey at the same time in the workshop could have influenced each other. We assumed extreme values from the survey were typos or answered incorrectly instead of true, and that our results were normalised if it had a bell shape on a histogram.
Data cleaning:
Data cleaning involved removing extreme values (identified using boxplots in R) to ensure data integrity, separating non-consenting participants, and excluding rent values of “0”, this reduced our domestic sample size from 1279 to 221, as indicated by the survey that they were living at home, as well as removing blank values in the “student_type” variable.
Example graphs, before and after data cleaning
#initial plot
bp1 <- ggplot(data, aes(x = student_type, y = rent, fill = student_type)) +
geom_boxplot() +
labs(title = "Relationship between rent of international and domestic students",
x = "Student type", y = "Rent (AUD)", subtitle = "(Before IDA)", fill = "Student Type")
#design
bp1 + theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5)
)#filtered data
fil_data <- filter(data, consent == "I consent to take part in the study",
rent >= 50 & rent < 1500,
commute > 0 & commute < 300,
student_type != "")
#initial plot
bp2 <- ggplot(fil_data, aes(x = student_type, y = rent, fill = student_type)) +
geom_boxplot() +
labs(title = "Relationship between rent of international and domestic students",
x = "Student type", y = "Rent (AUD)", subtitle = "(After IDA)", fill = "Student Type")
#design
bp2 + theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5)
)The commute for domestic and international students significantly varies.
The international students’ commute is positively skewed, implying that many live on campus; resulting in a lower median commute (20) compared to the mean (26.3).
Domestic students’ commutes are only slightly positively skewed, with a higher median to international (25), suggesting that many students live off-campus.
The Inter - Quartile Range (IQR) for international students had a narrower spread of 15 compared to domestic students of 35, suggesting that commute time is more consistent among international students. This suggests that international students have the majority of on - campus housing, while domestic students are spread through neighbouring suburbs.
This implies that more domestic students live in a wide variety of areas accounting for varying ranges from the university.
The maximum value we accepted for both types of students was 300 minutes, which was greater than the maximum for international students at 120 minutes anyway. International students had 9 suspected outliers (> 52.5), this is most likely due to a greater sample size for internationals of 690 vs 220 domestic student, this corresponds to domestic’s 3 suspected outliers above 102.5.
#initial plot
bp3 <- ggplot(fil_data, aes(x = student_type, y = commute, fill = student_type)) +
geom_boxplot() +
labs(title = "International and domestic commutes to USYD", x = "Student type",
y = "Commute (mins)", fill = "Student Type")
#design
bp3 + theme(
plot.title = element_text(hjust = 0.5)
)# density histogram with both
hist1 <- ggplot(fil_data, aes(x = commute, fill = student_type)) +
geom_histogram(aes(y = after_stat(density)), bins = 25, position = "dodge") +
labs(x = "Commute (mins)", y = "Density", title = "Density of domestic and international commuters", fill = "Student Type")
hist1 + theme(
plot.title = element_text(hjust = 0.5)
)#IQR
dfil_data <- filter(fil_data, student_type == "Domestic")
ifil_data <- filter(fil_data, student_type == "International")
iqrd = quantile(dfil_data$commute)
iqri = quantile(ifil_data$commute)
iqrd 0% 25% 50% 75% 100%
1.15 15.00 25.00 50.00 180.00
iqri 0% 25% 50% 75% 100%
0.5 15.0 20.0 30.0 120.0
mean(dfil_data$commute)[1] 35.05498
sd(dfil_data$commute)[1] 30.16238
mean(ifil_data$commute)[1] 26.30478
sd(ifil_data$commute)[1] 17.25875
mean(fil_data$commute)[1] 28.4275
length(ifil_data$commute)[1] 690
length(dfil_data$commute)[1] 221
ddata <- filter(data, student_type == "Domestic")
length(ddata$commute)[1] 1279
Analysis of the regression line (38 minutes with a slope of -0.01912) and weak negative correlation (r = - 0.2) between commute and rent for students, suggests that the higher the rent, the closer students live to Sydney University. Due to the weakness of the correlation, this trend is not very consistent.
The scatterplot lacked an overall linear pattern, indicating that the variables lacked association with each other with a large RMS error of 20.96.
Accordingly, the residual plot displays a heteroscedastic pattern, (clustering in the lower right), indicating the variance is not consistent between data points.
This large variance and RMS error of 20.96 indicates that our regression line cannot predict values accurately at all and we cannot approximate our results as a normal distribution.
Overall, factors such as income, and hours worked may have had a greater impact on rent. Thus, research (Duffy, 2025) shows no link between the number of international students and rent costs, as international students have different living requirements than local citizens. Therefore, single variables such as rent and type of student were unable to predict commute.
scat1 <- ggplot(fil_data, aes(x = rent, y = commute)) +
geom_point() +
labs(title = "Rent and commute for domestic and international students", x = "Rent (AUD)", y = "Commute (mins)") +
stat_smooth(method = "lm", col = "red")
scat1 + theme(
plot.title = element_text(hjust = 0.5)
)#colour coded
scat2 <- ggplot(fil_data, aes(x = rent, y = commute, color = student_type)) +
geom_point() +
labs(title = "Rent and commute for domestic and international students", x = "Rent (AUD)", y = "Commute (mins)", color = "Student Type") +
stat_smooth(method = "lm", col = "red")
scat2 + theme(
plot.title = element_text(hjust = 0.5)
)#domestic only
scat1 <- ggplot(dfil_data, aes(x = rent, y = commute)) +
geom_point() +
labs(title = "Rent and commute for domestic students", x = "Rent (AUD)", y = "Commute (mins)") +
stat_smooth(method = "lm", col = "red")
scat1 + theme(
plot.title = element_text(hjust = 0.5)
)#international only
scat1 <- ggplot(ifil_data, aes(x = rent, y = commute)) +
geom_point() +
labs(title = "Rent and commute for international students", x = "Rent (AUD)", y = "Commute (mins)") +
stat_smooth(method = "lm", col = "red")
scat1 + theme(
plot.title = element_text(hjust = 0.5)
)#residual model
corr = cor(fil_data$rent, fil_data$commute)
model = lm(commute ~ rent, data = fil_data)
model
Call:
lm(formula = commute ~ rent, data = fil_data)
Coefficients:
(Intercept) rent
38.73291 -0.01912
corr[1] -0.2059461
ggplot(model, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", colour = "red") +
labs(x = "Fitted", y = "Residual")ysd = sd(fil_data$commute)
ysd[1] 21.4372
sort(fil_data$rent, decreasing = FALSE) [1] 50.0 50.0 50.0 50.0 50.0 50.0 50.0 75.0 80.0 80.0
[11] 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 120.0 125.0
[21] 130.0 130.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0
[31] 150.0 150.0 150.0 150.0 151.0 155.0 160.0 160.0 165.0 170.0
[41] 170.0 175.0 180.0 184.0 185.0 187.0 189.0 190.0 190.0 197.5
[51] 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0
[61] 200.0 200.0 200.0 200.0 200.0 200.8 206.0 210.0 210.0 210.0
[71] 217.0 220.0 222.0 225.0 225.0 230.0 230.0 240.0 240.0 240.0
[81] 240.0 245.0 245.0 250.0 250.0 250.0 250.0 250.0 250.0 250.0
[91] 250.0 250.0 250.0 250.0 250.0 250.0 250.0 250.0 250.0 250.0
[101] 257.0 260.0 260.0 265.0 265.0 270.0 270.0 272.0 272.5 272.5
[111] 275.0 275.0 275.0 275.0 280.0 280.0 280.0 280.0 285.0 285.0
[121] 285.0 287.5 290.0 295.0 300.0 300.0 300.0 300.0 300.0 300.0
[131] 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0
[141] 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0
[151] 300.0 300.0 300.0 300.0 300.0 300.0 300.0 300.0 305.0 307.0
[161] 312.5 317.0 320.0 320.0 320.0 320.0 320.0 320.0 320.0 325.0
[171] 330.0 330.0 330.0 335.0 340.0 345.0 345.0 347.0 348.0 350.0
[181] 350.0 350.0 350.0 350.0 350.0 350.0 350.0 350.0 350.0 350.0
[191] 350.0 350.0 350.0 350.0 350.0 350.0 350.0 350.0 350.0 350.0
[201] 350.0 350.0 350.0 350.0 350.0 350.0 350.0 354.0 354.0 354.0
[211] 354.0 356.0 359.0 360.0 360.0 360.0 360.0 360.0 360.0 360.0
[221] 360.0 360.0 360.0 362.0 362.0 366.0 366.0 367.0 367.0 367.0
[231] 367.0 367.0 367.0 368.0 370.0 370.0 375.0 375.0 375.0 375.0
[241] 378.0 378.0 378.0 378.0 380.0 380.0 380.0 380.0 380.0 380.0
[251] 385.0 385.0 386.0 387.0 390.0 390.0 391.0 391.0 392.0 392.0
[261] 392.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0
[271] 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0
[281] 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0 400.0
[291] 400.0 400.0 400.0 400.0 410.0 410.0 410.0 410.0 410.0 415.0
[301] 420.0 420.0 420.0 420.0 420.0 430.0 430.0 430.0 430.0 438.0
[311] 439.0 439.0 439.0 440.0 440.0 442.5 450.0 450.0 450.0 450.0
[321] 450.0 450.0 450.0 450.0 450.0 450.0 450.0 450.0 450.0 450.0
[331] 450.0 450.0 450.0 450.0 450.0 450.0 450.0 450.0 450.0 450.0
[341] 450.0 450.0 450.0 450.0 450.0 450.0 450.0 452.0 452.0 452.0
[351] 452.0 452.0 456.0 456.0 459.0 459.0 459.0 459.0 460.0 460.0
[361] 460.0 460.0 460.0 460.0 460.0 462.5 463.0 469.0 469.0 469.0
[371] 469.0 470.0 470.0 470.0 475.0 475.0 475.0 475.0 475.0 477.0
[381] 479.0 479.0 480.0 480.0 480.0 480.0 480.0 480.0 480.0 480.0
[391] 480.0 480.0 480.0 480.0 480.0 480.0 480.0 485.0 489.0 490.0
[401] 490.0 498.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0
[411] 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0
[421] 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0
[431] 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0
[441] 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0 500.0
[451] 509.0 510.0 510.0 520.0 520.0 520.0 520.0 520.0 520.0 520.0
[461] 520.0 520.0 525.0 525.0 525.0 529.0 529.0 529.0 530.0 530.0
[471] 530.0 530.0 530.0 539.0 539.0 540.0 540.0 540.0 540.0 540.0
[481] 544.0 545.0 549.0 549.0 549.0 549.0 549.0 549.0 549.0 549.0
[491] 549.0 550.0 550.0 550.0 550.0 550.0 550.0 550.0 550.0 550.0
[501] 550.0 550.0 550.0 550.0 550.0 550.0 550.0 550.0 550.0 550.0
[511] 550.0 550.0 550.0 550.0 550.0 550.0 560.0 560.0 560.0 560.0
[521] 569.0 569.0 569.0 570.0 575.0 575.0 578.0 580.0 580.0 580.0
[531] 584.0 585.0 586.0 586.0 589.0 589.0 589.0 589.0 590.0 592.0
[541] 595.0 599.0 599.0 599.0 599.0 599.0 599.0 599.0 599.0 600.0
[551] 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0
[561] 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0
[571] 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0
[581] 600.0 600.0 600.0 600.0 600.0 600.0 600.0 600.0 610.0 612.0
[591] 612.5 618.0 619.0 620.0 620.0 620.0 625.0 625.0 625.0 629.0
[601] 629.0 629.0 630.0 630.0 633.0 633.0 633.0 633.5 637.0 637.0
[611] 637.0 640.0 640.0 640.0 640.0 650.0 650.0 650.0 650.0 650.0
[621] 650.0 650.0 650.0 650.0 650.0 650.0 650.0 650.0 650.0 650.0
[631] 650.0 650.0 650.0 650.0 650.0 650.0 652.0 659.0 659.0 660.0
[641] 660.0 660.0 660.0 660.0 670.0 672.0 675.0 675.0 679.0 680.0
[651] 680.0 682.0 689.0 689.0 689.0 689.0 690.0 690.0 690.0 695.0
[661] 699.0 699.0 699.0 699.0 699.0 700.0 700.0 700.0 700.0 700.0
[671] 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0
[681] 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0
[691] 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0 700.0 709.0
[701] 709.0 710.0 715.0 719.0 720.0 725.0 725.0 725.0 729.0 729.0
[711] 730.0 730.0 730.0 735.0 739.0 739.0 739.0 739.0 740.0 747.0
[721] 749.0 749.0 750.0 750.0 750.0 750.0 750.0 750.0 750.0 750.0
[731] 750.0 750.0 750.0 750.0 750.0 750.0 756.0 759.0 759.0 759.0
[741] 759.0 759.0 759.0 760.0 760.0 769.0 769.0 769.0 770.0 770.0
[751] 770.0 770.0 773.5 775.0 775.0 779.0 779.0 779.0 779.0 779.0
[761] 779.0 780.0 780.0 780.0 780.0 780.0 785.0 788.0 789.0 789.0
[771] 789.0 789.0 789.0 799.0 799.0 799.0 799.0 799.0 799.0 799.0
[781] 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0
[791] 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0
[801] 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0 800.0
[811] 802.0 802.0 809.0 809.0 815.0 815.0 819.0 819.0 819.0 819.0
[821] 820.0 820.0 820.0 821.0 825.0 829.0 829.0 830.0 830.0 839.0
[831] 839.0 839.0 850.0 850.0 850.0 850.0 860.0 860.0 870.0 875.0
[841] 878.0 879.0 880.0 880.0 890.0 890.0 896.0 899.0 899.0 900.0
[851] 900.0 900.0 900.0 900.0 900.0 900.0 900.0 900.0 900.0 900.0
[861] 900.0 900.0 909.0 909.0 909.0 909.0 909.0 913.0 919.0 939.0
[871] 940.0 950.0 950.0 950.0 950.0 950.0 980.0 980.0 1000.0 1000.0
[881] 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0
[891] 1000.0 1000.0 1000.0 1000.0 1000.0 1020.0 1020.0 1040.0 1050.0 1078.0
[901] 1100.0 1100.0 1100.0 1200.0 1200.0 1200.0 1280.0 1300.0 1350.0 1411.0
[911] 1440.0
Group of Eight. (2024, May 1). Policy brief: International students and housing and other cost of living pressures. https://go8.edu.au/policy-brief-international-students-and-housing-and-other-cost-of-living-pressures
Duffy, C. (2025, March 20). International students not to blame for rising rents, Australian study finds. Abc.net.au; ABC News. https://www.abc.net.au/news/2025-03-21/australia-rent-crisis-not-international-students-fault-study/105076290
We maintained the shared values of truthfulness and integrity by ensuring that we presented data that accurately and transparently reflected the findings of our analysis. Any data cleaning undertaken is transparent as displayed by our report, as well as the code we used to generate graphs; demonstrating reproducible data. Additionally, we let the data from the report guide our conclusions, rather than having pre-existing beliefs or biases that would have negatively influenced the interpretation of any results. Overall, by presenting all of our findings transparently we were able to effectively uphold the values of truthfulness and integrity in our research process
We sustained the ethical principle of Maintaining Confidence in Statistics by clearly establishing our findings accurately when presenting and analysing our data. We acknowledged and discussed all the potential limitations and assumptions of the data set and how variables/outliers such as sample size and response bias would affect the data, limiting the reliability and accuracy for users. Hence, this transparency enables for the data to be interpreted appropriately whilst maintaining public trust.
| Date | Time | Attendance | Type |
|---|---|---|---|
| 14/03 | 1 to 2:30 pm | Everyone | On campus |
| 22/03 | 1 to 3 pm | Everyone | Zoom call |
| 28/03 | 1 to 3 pm | Everyone | On campus |
| 4/04 | 1 to 3 pm | Everyone | Zoom call |
| 8/04 | 2 to 4:20 pm | Everyone | ZOom call |
| Member | Contributions | Resources used |
|---|---|---|
| Issac | ||
| Elisei | ||
| Loi | ||
| Jasmine | ||
| Chloe | ||
| Oscar |