DATA1001 Project Template

Author

Code
library(tidyverse)
data = read.csv("C:\\Users\\oswat\\OneDrive\\Desktop\\DATA1001\\data1001_survey_data_2025_S1.csv")

Recommendation/Insight

Our report explores the relationships between student’s rent, commute, and residency status. Graphical and numerical analysis show domestic students commute longer hours compared to international students. Linear modelling revealed a weak negative correlation between rent prices and commute time. Further data is needed to assess student accessibility to campus.

Evidence

IDA including Source and structure of the survey

The data was derived from a survey completed by 2145 students (2103 consented) from 2024 semester 2 and 2025 semester 1, assessing 28 variables. Our analysis focuses on how commute is influenced by rent and student type (domestic & international). We chose rent and commute because we believe they were measures that students would know off the top of their head and would be able to report accurately, unlike variables like expected/aimed mark and number of hours studied which can be influenced by social desirability bias.

Limitations:

Some of the limitations of the data arise from selection and consent bias, which omits responses from DATA1001 students who did not participate (or consent) in the survey. In addition, the data only reflects the sample taken from data1001/data1901 students and not the population of Sydney university students.There is also a response bias where questions could have been misinterpreted or answered dishonestly.

Assumptions:

We made the assumption of independence, that responses to the survey were not influenced by other people’s responses, however it is possible that people doing the survey at the same time in the workshop could have influenced each other. We assumed extreme values from the survey were typos or answered incorrectly instead of true, and that our results were normalised if it had a bell shape on a histogram.

Data cleaning:

Data cleaning involved removing extreme values (identified using boxplots in R) to ensure data integrity, separating non-consenting participants, and excluding rent values of “0”, this reduced our domestic sample size from 1279 to 221, as indicated by the survey that they were living at home, as well as removing blank values in the “student_type” variable.

Example graphs, before and after data cleaning

Code
#initial plot
bp1 <- ggplot(data, aes(x = student_type, y = rent, fill = student_type)) + 
  geom_boxplot() + 
  labs(title = "Relationship between rent of international and domestic students",
    x = "Student type", y = "Rent (AUD)", subtitle = "(Before IDA)", fill = "Student Type")
#design
bp1 + theme(
  plot.title = element_text(hjust = 0.5),
  plot.subtitle = element_text(hjust = 0.5)
)

Code
#filtered data
fil_data <- filter(data, consent == "I consent to take part in the study", 
                   rent >= 50 & rent < 1500,
                   commute > 0 & commute < 300,
                   student_type != "")
#initial plot

bp2 <- ggplot(fil_data, aes(x = student_type, y = rent, fill = student_type)) + 
  geom_boxplot() + 
  labs(title = "Relationship between rent of international and domestic students", 
       x = "Student type", y = "Rent (AUD)", subtitle = "(After IDA)", fill = "Student Type")
#design
bp2 + theme(
  plot.title = element_text(hjust = 0.5),
  plot.subtitle = element_text(hjust = 0.5)
)

How does the commute time vary between international and domestic students?

The commute for domestic and international students significantly varies.

The international students’ commute is positively skewed, implying that many live on campus; resulting in a lower median commute (20) compared to the mean (26.3).

Domestic students’ commutes are only slightly positively skewed, with a higher median to international (25), suggesting that many students live off-campus.

The Inter - Quartile Range (IQR) for international students had a narrower spread of 15 compared to domestic students of 35, suggesting that commute time is more consistent among international students. This suggests that international students have the majority of on - campus housing, while domestic students are spread through neighbouring suburbs.

This implies that more domestic students live in a wide variety of areas accounting for varying ranges from the university.

The maximum value we accepted for both types of students was 300 minutes, which was greater than the maximum for international students at 120 minutes anyway. International students had 9 suspected outliers (> 52.5), this is most likely due to a greater sample size for internationals of 690 vs 220 domestic student, this corresponds to domestic’s 3 suspected outliers above 102.5.

Code
#initial plot
bp3 <- ggplot(fil_data, aes(x = student_type, y = commute, fill = student_type)) + 
  geom_boxplot() + 
  labs(title = "International and domestic commutes to USYD", x = "Student type",
       y = "Commute (mins)", fill = "Student Type") 

#design
bp3 + theme(
  plot.title = element_text(hjust = 0.5)

)

Code
# density histogram with both
hist1 <- ggplot(fil_data, aes(x = commute, fill = student_type)) + 
  geom_histogram(aes(y = after_stat(density)), bins = 25, position = "dodge") + 
  labs(x = "Commute (mins)", y = "Density", title = "Density of domestic and international commuters", fill = "Student Type")

hist1 + theme(
  plot.title =  element_text(hjust = 0.5)
  
)

Code
#IQR
dfil_data <- filter(fil_data, student_type == "Domestic")
ifil_data <- filter(fil_data, student_type == "International")



iqrd = quantile(dfil_data$commute)
iqri = quantile(ifil_data$commute)
iqrd
    0%    25%    50%    75%   100% 
  1.15  15.00  25.00  50.00 180.00 
Code
iqri
   0%   25%   50%   75%  100% 
  0.5  15.0  20.0  30.0 120.0 
Code
mean(dfil_data$commute)
[1] 35.05498
Code
sd(dfil_data$commute)
[1] 30.16238
Code
mean(ifil_data$commute)
[1] 26.30478
Code
sd(ifil_data$commute)
[1] 17.25875
Code
mean(fil_data$commute)
[1] 28.4275
Code
length(ifil_data$commute)
[1] 690
Code
length(dfil_data$commute)
[1] 221
Code
ddata <- filter(data, student_type == "Domestic")
length(ddata$commute)
[1] 1279

Research Question 2 (Linear Model)

Analysis of the regression line (38 minutes with a slope of -0.01912) and weak negative correlation (r = - 0.2) between commute and rent for students, suggests that the higher the rent, the closer students live to Sydney University. Due to the weakness of the correlation, this trend is not very consistent.

The scatterplot lacked an overall linear pattern, indicating that the variables lacked association with each other with a large RMS error of 20.96.

Accordingly, the residual plot displays a heteroscedastic pattern, (clustering in the lower right), indicating the variance is not consistent between data points.

This large variance and RMS error of 20.96 indicates that our regression line cannot predict values accurately at all and we cannot approximate our results as a normal distribution.

Overall, factors such as income, and hours worked may have had a greater impact on rent. Thus, research (Duffy, 2025) shows no link between the number of international students and rent costs, as international students have different living requirements than local citizens. Therefore, single variables such as rent and type of student were unable to predict commute.

Code
scat1 <- ggplot(fil_data, aes(x = rent, y = commute)) +
  geom_point() + 
  labs(title = "Rent and commute for domestic and international students", x = "Rent (AUD)", y = "Commute (mins)") +
    stat_smooth(method = "lm", col = "red")

scat1 + theme(
  plot.title =  element_text(hjust = 0.5)
)

Code
#colour coded
scat2 <- ggplot(fil_data, aes(x = rent, y = commute, color = student_type)) +
  geom_point() + 
  labs(title = "Rent and commute for domestic and international students", x = "Rent (AUD)", y = "Commute (mins)", color = "Student Type") +
    stat_smooth(method = "lm", col = "red")

scat2 + theme(
  plot.title =  element_text(hjust = 0.5)
)

Code
#domestic only
scat1 <- ggplot(dfil_data, aes(x = rent, y = commute)) +
  geom_point() + 
  labs(title = "Rent and commute for domestic students", x = "Rent (AUD)", y = "Commute (mins)") +
    stat_smooth(method = "lm", col = "red")

scat1 + theme(
  plot.title =  element_text(hjust = 0.5)
)

Code
#international only
scat1 <- ggplot(ifil_data, aes(x = rent, y = commute)) +
  geom_point() + 
  labs(title = "Rent and commute for international students", x = "Rent (AUD)", y = "Commute (mins)") +
    stat_smooth(method = "lm", col = "red")

scat1 + theme(
  plot.title =  element_text(hjust = 0.5)
)

Code
#residual model
corr = cor(fil_data$rent, fil_data$commute)
model = lm(commute ~ rent, data = fil_data)
model

Call:
lm(formula = commute ~ rent, data = fil_data)

Coefficients:
(Intercept)         rent  
   38.73291     -0.01912  
Code
corr
[1] -0.2059461
Code
ggplot(model, aes(x = .fitted, y = .resid)) + 
    geom_point() + 
    geom_hline(yintercept = 0, linetype = "dashed", colour = "red") + 
    labs(x = "Fitted", y = "Residual")

Code
ysd = sd(fil_data$commute)



ysd
[1] 21.4372
Code
sort(fil_data$rent, decreasing = FALSE)
  [1]   50.0   50.0   50.0   50.0   50.0   50.0   50.0   75.0   80.0   80.0
 [11]  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  120.0  125.0
 [21]  130.0  130.0  150.0  150.0  150.0  150.0  150.0  150.0  150.0  150.0
 [31]  150.0  150.0  150.0  150.0  151.0  155.0  160.0  160.0  165.0  170.0
 [41]  170.0  175.0  180.0  184.0  185.0  187.0  189.0  190.0  190.0  197.5
 [51]  200.0  200.0  200.0  200.0  200.0  200.0  200.0  200.0  200.0  200.0
 [61]  200.0  200.0  200.0  200.0  200.0  200.8  206.0  210.0  210.0  210.0
 [71]  217.0  220.0  222.0  225.0  225.0  230.0  230.0  240.0  240.0  240.0
 [81]  240.0  245.0  245.0  250.0  250.0  250.0  250.0  250.0  250.0  250.0
 [91]  250.0  250.0  250.0  250.0  250.0  250.0  250.0  250.0  250.0  250.0
[101]  257.0  260.0  260.0  265.0  265.0  270.0  270.0  272.0  272.5  272.5
[111]  275.0  275.0  275.0  275.0  280.0  280.0  280.0  280.0  285.0  285.0
[121]  285.0  287.5  290.0  295.0  300.0  300.0  300.0  300.0  300.0  300.0
[131]  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0
[141]  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0
[151]  300.0  300.0  300.0  300.0  300.0  300.0  300.0  300.0  305.0  307.0
[161]  312.5  317.0  320.0  320.0  320.0  320.0  320.0  320.0  320.0  325.0
[171]  330.0  330.0  330.0  335.0  340.0  345.0  345.0  347.0  348.0  350.0
[181]  350.0  350.0  350.0  350.0  350.0  350.0  350.0  350.0  350.0  350.0
[191]  350.0  350.0  350.0  350.0  350.0  350.0  350.0  350.0  350.0  350.0
[201]  350.0  350.0  350.0  350.0  350.0  350.0  350.0  354.0  354.0  354.0
[211]  354.0  356.0  359.0  360.0  360.0  360.0  360.0  360.0  360.0  360.0
[221]  360.0  360.0  360.0  362.0  362.0  366.0  366.0  367.0  367.0  367.0
[231]  367.0  367.0  367.0  368.0  370.0  370.0  375.0  375.0  375.0  375.0
[241]  378.0  378.0  378.0  378.0  380.0  380.0  380.0  380.0  380.0  380.0
[251]  385.0  385.0  386.0  387.0  390.0  390.0  391.0  391.0  392.0  392.0
[261]  392.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0
[271]  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0
[281]  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0  400.0
[291]  400.0  400.0  400.0  400.0  410.0  410.0  410.0  410.0  410.0  415.0
[301]  420.0  420.0  420.0  420.0  420.0  430.0  430.0  430.0  430.0  438.0
[311]  439.0  439.0  439.0  440.0  440.0  442.5  450.0  450.0  450.0  450.0
[321]  450.0  450.0  450.0  450.0  450.0  450.0  450.0  450.0  450.0  450.0
[331]  450.0  450.0  450.0  450.0  450.0  450.0  450.0  450.0  450.0  450.0
[341]  450.0  450.0  450.0  450.0  450.0  450.0  450.0  452.0  452.0  452.0
[351]  452.0  452.0  456.0  456.0  459.0  459.0  459.0  459.0  460.0  460.0
[361]  460.0  460.0  460.0  460.0  460.0  462.5  463.0  469.0  469.0  469.0
[371]  469.0  470.0  470.0  470.0  475.0  475.0  475.0  475.0  475.0  477.0
[381]  479.0  479.0  480.0  480.0  480.0  480.0  480.0  480.0  480.0  480.0
[391]  480.0  480.0  480.0  480.0  480.0  480.0  480.0  485.0  489.0  490.0
[401]  490.0  498.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0
[411]  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0
[421]  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0
[431]  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0
[441]  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0  500.0
[451]  509.0  510.0  510.0  520.0  520.0  520.0  520.0  520.0  520.0  520.0
[461]  520.0  520.0  525.0  525.0  525.0  529.0  529.0  529.0  530.0  530.0
[471]  530.0  530.0  530.0  539.0  539.0  540.0  540.0  540.0  540.0  540.0
[481]  544.0  545.0  549.0  549.0  549.0  549.0  549.0  549.0  549.0  549.0
[491]  549.0  550.0  550.0  550.0  550.0  550.0  550.0  550.0  550.0  550.0
[501]  550.0  550.0  550.0  550.0  550.0  550.0  550.0  550.0  550.0  550.0
[511]  550.0  550.0  550.0  550.0  550.0  550.0  560.0  560.0  560.0  560.0
[521]  569.0  569.0  569.0  570.0  575.0  575.0  578.0  580.0  580.0  580.0
[531]  584.0  585.0  586.0  586.0  589.0  589.0  589.0  589.0  590.0  592.0
[541]  595.0  599.0  599.0  599.0  599.0  599.0  599.0  599.0  599.0  600.0
[551]  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0
[561]  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0
[571]  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0
[581]  600.0  600.0  600.0  600.0  600.0  600.0  600.0  600.0  610.0  612.0
[591]  612.5  618.0  619.0  620.0  620.0  620.0  625.0  625.0  625.0  629.0
[601]  629.0  629.0  630.0  630.0  633.0  633.0  633.0  633.5  637.0  637.0
[611]  637.0  640.0  640.0  640.0  640.0  650.0  650.0  650.0  650.0  650.0
[621]  650.0  650.0  650.0  650.0  650.0  650.0  650.0  650.0  650.0  650.0
[631]  650.0  650.0  650.0  650.0  650.0  650.0  652.0  659.0  659.0  660.0
[641]  660.0  660.0  660.0  660.0  670.0  672.0  675.0  675.0  679.0  680.0
[651]  680.0  682.0  689.0  689.0  689.0  689.0  690.0  690.0  690.0  695.0
[661]  699.0  699.0  699.0  699.0  699.0  700.0  700.0  700.0  700.0  700.0
[671]  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0
[681]  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0
[691]  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0  700.0  709.0
[701]  709.0  710.0  715.0  719.0  720.0  725.0  725.0  725.0  729.0  729.0
[711]  730.0  730.0  730.0  735.0  739.0  739.0  739.0  739.0  740.0  747.0
[721]  749.0  749.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0
[731]  750.0  750.0  750.0  750.0  750.0  750.0  756.0  759.0  759.0  759.0
[741]  759.0  759.0  759.0  760.0  760.0  769.0  769.0  769.0  770.0  770.0
[751]  770.0  770.0  773.5  775.0  775.0  779.0  779.0  779.0  779.0  779.0
[761]  779.0  780.0  780.0  780.0  780.0  780.0  785.0  788.0  789.0  789.0
[771]  789.0  789.0  789.0  799.0  799.0  799.0  799.0  799.0  799.0  799.0
[781]  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0
[791]  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0
[801]  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0  800.0
[811]  802.0  802.0  809.0  809.0  815.0  815.0  819.0  819.0  819.0  819.0
[821]  820.0  820.0  820.0  821.0  825.0  829.0  829.0  830.0  830.0  839.0
[831]  839.0  839.0  850.0  850.0  850.0  850.0  860.0  860.0  870.0  875.0
[841]  878.0  879.0  880.0  880.0  890.0  890.0  896.0  899.0  899.0  900.0
[851]  900.0  900.0  900.0  900.0  900.0  900.0  900.0  900.0  900.0  900.0
[861]  900.0  900.0  909.0  909.0  909.0  909.0  909.0  913.0  919.0  939.0
[871]  940.0  950.0  950.0  950.0  950.0  950.0  980.0  980.0 1000.0 1000.0
[881] 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0 1000.0
[891] 1000.0 1000.0 1000.0 1000.0 1000.0 1020.0 1020.0 1040.0 1050.0 1078.0
[901] 1100.0 1100.0 1100.0 1200.0 1200.0 1200.0 1280.0 1300.0 1350.0 1411.0
[911] 1440.0

Articles

Group of Eight. (2024, May 1). Policy brief: International students and housing and other cost of living pressures. https://go8.edu.au/policy-brief-international-students-and-housing-and-other-cost-of-living-pressures

Duffy, C. (2025, March 20). International students not to blame for rising rents, Australian study finds. Abc.net.au; ABC News. https://www.abc.net.au/news/2025-03-21/australia-rent-crisis-not-international-students-fault-study/105076290

Professional Standard of Report

We maintained the shared values of truthfulness and integrity by ensuring that we presented data that accurately and transparently reflected the findings of our analysis. Any data cleaning undertaken is transparent as displayed by our report, as well as the code we used to generate graphs; demonstrating reproducible data. Additionally, we let the data from the report guide our conclusions, rather than having pre-existing beliefs or biases that would have negatively influenced the interpretation of any results. Overall, by presenting all of our findings transparently we were able to effectively uphold the values of truthfulness and integrity in our research process

Ethical Principles

We sustained the ethical principle of Maintaining Confidence in Statistics by clearly establishing our findings accurately when presenting and analysing our data. We acknowledged and discussed all the potential limitations and assumptions of the data set and how variables/outliers such as sample size and response bias would affect the data, limiting the reliability and accuracy for users. Hence, this transparency enables for the data to be interpreted appropriately whilst maintaining public trust.

Acknowledgements

Group Meetings
Date Time Attendance Type
14/03 1 to 2:30 pm Everyone On campus
22/03 1 to 3 pm Everyone Zoom call
28/03 1 to 3 pm Everyone On campus
4/04 1 to 3 pm Everyone Zoom call
8/04 2 to 4:20 pm Everyone ZOom call
Group Member Contributions
Member Contributions Resources used
Issac
Elisei
Loi
Jasmine
Chloe
Oscar