I am currently a first-semester international student at the University of Technology Sydney. In fact, relocating to a new country to study means that I have to face up to many changes to adapt to a whole new environment. Hence, as the title suggests, this Quantified-Self Report is going to focus on analysing my lifestyle as a new overseas student in Sydney, in comparison with a group of my student cohort, and how it has changed during the first few months. The Report is going to be implemented based on both group and personal data sets.
For group data, after discussion, my group (DSI Octa) agreed to record two data sets, including nutrition and fitness (structured data) and images of shopping receipts (unstructured data). Data was collected for a period of five weeks, starting from 25 August 2022 to 29 September 2022. Our team’s communications were conducted via WhatsApp and Zoom meetings. As most of us are new international students in Australia, by collecting these data sets, we wanted to explore more about ourselves in terms of eating, sleeping and fitness patterns, as well as our shopping habits, in comparison with other group members.
For the personal data set, I am going to analyse my personal expenses (both cash and cashless) during four months (July – October 2022). I am interested to know how I managed my finance and how the expenditure patterns altered during my first semester of settling down with the new life in Sydney.
There are seven members in our group, most of whom are international students in their twenties (five out of seven). The other two members are domestic students, one of whom is 40 years old. These similarity and variety in student visa statuses and ages are expected to give insightful comparisons among the team members. In addition, the gender ratio between female and male is 4:3. These factors will be take into consideration when it comes to data analysis.
Nutrition and Fitness Data Set
Nutrition
Step Counts and Sleep Duration
Body Weight
Receipt Images Data Set
Nutrition and Fitness Data Set
Nutrition
Step Counts, Sleep Duration and Body Weight
Receipt Images Data Set
To ensure the anonymity of the data, each of the team member chose a pseudonym of their own preferences. There are seven names used, namely “Bheeshma”, “Commuter”, “Hedwig”, “Lucifer”, “Magnolia”, “Orchard” and “Spinach”. My pseudonym is Magnolia.
Our collected data sets were stored in a Google Drive’s shared folder, which was only accessible by group members and the subject coordinator. This ensured the data privacy and prevented anyone outside of our group from accessing it.
As regards to re-identification, from the figures in the Body Weight data set, it is relatively easy to know the gender of each individual. Although the body weight is not a straightforward identifier, it can be used to narrow down the identity of the group members. In addition, the Receipt Images data set is most prone to re-identification. Our practice is that each team member uploaded a folder of their own receipt images to the group’s Google Drive shared folder. However, the “Owner” column shows the true name of the account. Therefore, if anyone outside of the group has the access to this folder, the data anonymity will no longer be maintained. We must have realized this issue sooner because it would cause certain serious consequences.
Nutrition and Fitness Data Set
From Figure 3.1.1, we can point out how many records are missing or duplicated. In particular, for the Water Intake, Step Count, Sleep Time and Total Calories data sets, for the period from 25 August - 29 September 2022, there should be 35 records per participant per data set, representing for 35 days. However, Bheeshma, Lucifer and Orchard either provided additional data which was not in the recording period, or had duplicated rows for the same day. Therefore, these records were removed to ensure the consistency and accuracy of the data set. For Magnolia and Spinach, they forgot to log their data on some days.
In the Body Weight data set which was recorded weekly, there were missing data of two weeks for Hedwig and Magnolia as they forgot to record the data. Apart from that, there was one data input from someone named “Jupiter”. I suppose this was a mistake, hence this record will be removed from the data set.
Figure 3.1.1 - Summary Table of Data Records
Receipt Images Data Set
For the Receipt Images data, because everyone had different shopping routines, there are no fixed amount of records one should have. Furthermore, it is difficult to justify whether someone might forget to collect the bills after payment, or might lose some of the receipts.
Overall, the amount of missing and duplicated data is negligible and therefore, does not largely affect the data quality.
As can be seen from Figure 3.2.1, most group members had consistent levels of calorie intake, except for me (Magnolia). The total calorie amounts that I consumed varied widely. Particularly, there were two outliers at nearly 10,000 calories per day. After checking, on 29 August and 30 August, I had Vietnamese Bun Cha which was recorded at 3,389 calories per serving. However, when I had a double-check on MyfitnessPal’s website, Vietnamese Bun Cha provides with only 810 calories per serving. This discrepancy issue is also observed in records of more than 5,000 calories per day. For example, hot chocolate and a bowl of jasmine rice are equivalent to 120 and 160 calories respectively as shown on their website. Nevertheless, they were recorded up to 940 and 669 calories respectively in the mobile application. As expected, other users in the MyfitnessPal community also experienced this issue. Therefore, the outliers and scattered data points of my records are due to the application issue and will not be removed from the data set.
Figure 3.2.1 - Outliers of Calories Intake Data Set
For Step Counts data set, there are outliers from Lucifer’s records, which are 30,000 steps on 09-September and 32,000 steps on 10-September (Figure 3.2.2). I checked with Lucifer if he had any special events on those days. Although he could not recall exactly what happened, he thought that he might be doing more cardio exercises on those two days, which made his step counts more than other days. In addition, Spinach had records of zero values on 25 - 27 August as he forgot to record the data. These values will be replaced as null values in the data set.
Figure 3.2.2 - Outliers of Step Counts Data Set
As depicted from Figure 3.2.3, there is only one outlier in the Water Intake data set, which is the zero value in Spinach’s record on 26 August. This data will be replaced as null value in the data set.
Figure 3.2.3 - Outliers of Water Intake Data Set
As referred from Figure 3.2.4, for the Sleep Duration data set, there are outliers where Magnolia recorded 0 minute of sleep on 26 and 27 August, and Orchard had a sleep duration of 754 minutes (more than 12 hours) on 02 September. After verifying, it is noted that Magnolia forgot to record on those two days. Therefore, I will replace the records with null values, instead of zero. For Orchard, it was a typo and should be 454 minutes instead. The value will be updated for data analysis.
Figure 3.2.4 - Outliers of Sleep Duration Data Set
From Figure 3.2.5, it can be seen that all of our team’s bill amounts were around AUD 200 and under. There is only one outlier of AUD 295.89 on 03 September of Hedwig. Although I raised the question but did not receive a confirmation from Hedwig, this data point still remains as it is for my analysis.
Figure 3.2.5 - Outliers of Receipt Images Data Set
There is a difference between the volatility of calories intake of male and female members (Figure 4.1.1). Particularly, males had very constant consumption figures during the period ranging from 1,250 - 3,700 calories per day. For females, while the daily consumption was mostly under 3,000 calories a day, there were several days when the intake rose up to 5,000 or 6,000 calories. Notably, during 24 - 29 September (StuVac week), they apparently enjoyed more food than normal.
Figure 4.1.1 - Average Calories Intake by Gender
As can be seen in Figure 4.1.2, overall, the team consumed an average of slightly more than 2,000 calories per day. On Monday, Tuesday and Saturday, everyone consumed more than the average calories intake. However, this includes the outliers of Magnolia on 29 August (Monday) and 30 August (Tuesday) of nearly 10,000 calories per day due to discrepancy between the MyfitnessPal app and website (as explained earlier in the Outlier Detection section).
Figure 4.1.2 - Average Calories Intake During the Week
Figure 4.2.1 illustrates the water intake of the group. It is clear that men drank more water than women in general. From the graph, it can also be seen that Spinach and I did not drink much of water during the day (mostly less than 1 litre).
Figure 4.2.1 - Water Intake of The Group
As suggested from Figure 4.2.2, there is only a correlation between the number of steps and water intake where step counts are more than 10,000. Indeed, this pattern belongs to Lucifer who did exercises almost everyday with more than 10,000 steps per day. Below this point, there is no obvious relationship between the two variables from the data set.
Figure 4.2.2 - Correlation between Step Counts and Water Intake
As depicted in Figure 4.3.1, males generally had more sleeping time than females, with the average of 7.3 versus 6.9 hours per day. However, it is also noted that 75% of both genders spent less than 8 hours of sleep daily.
Figure 4.3.1 - Average Sleep Duration by Gender
Figure 4.3.2 shows us sleeping patterns by gender during the week. In particular, male team members spent at least 6 sleeping hours on most days of the week. On Sundays, they obviously had more rest with more than 7.5 hours. Females, nevertheless, had different sleeping habits. During the week (except for Thursday), they mainly only had sleep of 7 hours or less. They seemed to spare weekends for rest days as the figures improved to 6.5 - 9 hours on those days.
Figure 4.3.2 - Sleep Pattern between Week Days and Weekends by Gender
From the histogram chart (Figure 4.4.1), we can draw the insight that most of the bills fall into categories from AUD 10-20, and AUD 40 with more than 10 bills per category.
Figure 4.4.1 - Frequency of Total Bill Amounts
Figure 4.4.2 illustrates clearly the shopping pattern of our group during the week. Specifically, people spent less at the beginning of the week (Monday - Wednesday). However, as the weekend approached, they had the tendency to do more shopping, especially on Saturdays.
Figure 4.4.2 - Total Spending During the Week
From the graph (Figure 4.5.1), it can be seen that rents accounted for most of my spending from July to October at around AUD 9.5K in total. This expense alone is even more than the total of other categories. However, it is worth noting that the rents also include bonds amount (AUD 2.4K) which will be refunded at the end of my lease period.
Surprisingly, expenses for my daughter’s education is almost equal to my expenditure on groceries. As a new student in a New South Wales public primary school, my daughter needed to purchase new uniform which was costly (approximately AUD 550). Apart from that, my StuVac week (24 September - 02 October) also overlapped with her term break (24 September - 09 October). Since all of my subjects’ assignments were due right after the StuVac, I needed to send her to the vacation care for three days to concentrate on my studies better. However, as I am an international student who is not entitled to Australian government’s subsidy, I had to bare the high cost of the vacation care (AUD 223). However, this expense is expected to occur only every now and then, especially when I am on exam weeks.
Figure 4.5.1 - Total Spending Per Category
As indicated in the below Figure 4.5.2, my monthly spending fluctuated widely during these four months. In July - my first month in Sydney, expenses stood at around AUD 2K because I was living at my friend’s place and did not pay the rents yet. However, August’s expenditure rocketed to over AUD 9.5K when I moved into my apartment. Apart from rent-related expenses (bonds and deposit), I had to spend a sum amount on appliances and furniture (approximately AUD 2.5K) as the apartment came unfurnished. These were, however, one-off purchases and will be depreciated over a period of at least two years. Afterwards, in September and October, the figures started to reflect the actual domestic spending at AUD 3K - 3.5K per month. The discrepancy of around AUD 500 between September and October is due to the rents. There was five weeks in September, while October only had four. Overall, I think I was quite in line with the budget control.
Figure 4.5.2 - Total Spending Per Month
Looking further at my monthly spending per category (Figure 4.5.3), it can be seen that effort was made to reduce expenses in several categories, including appliances and furniture, eating out, groceries, and transport. As I pursue a minimalism lifestyle, when moving into the new apartment in August, I decided to purchase only the bare minimum of items for my basic needs, such as fridge, washing machine, mattress and study desk. After August, I barely bought any large items for the apartment. In addition, my spending on eating out dropped significantly from AUD 400 in August to only around AUD 70 in October (equivalent to nearly 83% reduction). In fact, I tried to cook myself to cut down this kind of expense. During these three months, I was also busy to catch up with my studies (as a non-IT background student) and did not have time to hang out with my friends, which helps to save the eating out expenditure as well. As regards to groceries shopping, in October, I was able to save as nearly two-thirds as in August (AUD 350 versus AUD 950 respectively). After several shopping experiences, I often went to different shops to buy different types of groceries. For example, I realised that Harris Farm Markets and Asian supermarkets often offered better prices for fruits, vegetables and meat compared to big supermarkets. Meanwhile, Woolworths, Coles and Aldi were the best options for half-price and discounted products. Last but not least, I hardly spent on transportation fees in September and October. This is attributable to the fact that my house is within walking distance to UTS and shopping locations, so I did not have to use public transportation very often.
Figure 4.5.3 - Total Spending Per Month Per Category
Teamwork
There are several issues that our team could have done better. Firstly, if we could have started from an early stage, there would have been enough time for trials and errors, as well as feedback to find a suitable database, or an optimal application. In reality, it was not until several days before the AT2B draft submission deadline that our group discussed the application used for the unstructured data. Therefore, as future data scientists, I think we should know more about project management skills,
Data Anonymity
As explained earlier in this Report, the data anonymity is not well-maintained because of our Receipt Images folders on Google Drive. An alternative method could have been done is that everyone send their files to the group leader, and he/she helps to upload them to the shared folder. By this way, the true identity of group members would not be revealed.
Data Visualisation and Analysis
This Report has given me a lot of opportunities to practise my data visualisation skills. For this Report, I used packages in Python to plot the graphs (mainly ggplot and seaborn). In fact, before I only used Excel, this was the first time I coded to produce the plots. There were many trials and errors, as well as lessons learned along the way. It was interesting that for one type of graph, there are many different ways to present it based on the message that I wanted to convey. The exemplars and peer review were also definitely useful in providing me with some inspirations when I first started the analysis. In the upcoming time, I am absolutely interested to learn more about data visualisation to make it become my strength.
Nutrition
Regarding the discrepancy between MyfitnessPal’s application and website databases, this issue could lead to wrong conclusions, especially when users have limited experiences with calorie tracking application. Thus, it is essential for MyfitnessPal to fix this problem so that its reliability could be ameliorated.
Sleep Duration
Due to the limitation of calculating the sleep duration based on the off-screen time, it would have been better if our group had recorded the actual sleep start and end time. By this way, we could not only analyse the total sleep hours but also the sleeping pattern based on sleep start and end time.
Receipt Images
In my opinion, there are two major issues with this data set. The first one is our group did not utilise an OCR application to extract the data from the receipts. In lieu of automating the process, the way we implemented it involves manual work, which is not only inefficient but also unprofessional. As future data scientists, I think we need to consider a better method. Instead, our team should have spent time researching for an optimal solution which can help translate the data from image files to a machine-readable format automatically. Fortunately for us, there were not many receipts for us to process. However, imagine if we work for a project with thousands or millions of receipts, the practice of manual entry would definitely be impossible to do. The second problem is that this data set does not provide a holistic view of our shopping expenses as it does not take into account transactions without receipts, such as online payments. Therefore, due to this incompleteness, I think this data set might not be able to generate meaningful and accurate insights. If I could do this Report again, I would have chosen a different database, for example daily mood.
Nutrition
I was having a quite healthy nutrition diet and will continue to maintain it. In daily meals, I always balance between carbohydrates, fibers, and protein from different types of food. According to the UK National Health Service (2019), the recommended calorie intake for women is 2,000 cal per day. Not taking into account the outliers, my average calorie intake per day is around 2,500 cal, which is quite acceptable.
Water Intake
As clearly seen in the data visualisation, I was the one who drank least water in the group (less than 800ml per day). I often did not drink until feeling thirsty, which I know is not good for my health. Although the recommended water intake varies as per different individuals, according to Health Direct (2021), women generally need 2l of water everyday. Based on this, my initial goal is to drink 1.5l of water per day. In order to do this, I might need to drink in a bottle so that it is easier for me to know how much left I need to consume.
Step Counts
Research found that walking 10,000 steps per day brings numerous benefits for your health (10000steps). From the data set, I noticed the days that I most steps are those I went out with my friends (around 10,000 steps per day), or went to school to attend lectures of self-study (approximately 5,000 steps per day). Given the recent inactiveness, I might need to hang out with my friends or go to the library more often. During the term break, I might consider doing cardio exercises to improve my step counts as well as my heart rate.
Sleep Duration
From the Sleep Duration data set, I realised that I need to get more sleep. While others in the group had 6.9 - 7.3 hour sleep per day, I only had 6.8 hours on the average. According to Sleep Health Foundation, an adult needs to sleep for 7-9 hours per day to be able to revitalise themselves for the following day. The reason for this sleep deprivation is that I needed to spend time catching up with all the new knowledge delivered in the MDSI course. One valuable lesson that I learned is that if I had prepared myself with some basic knowledge in prior to the term, I would have not had a lot of difficulties during the semester. For instance, I spent one month learning SQL before the course’s commencement date, thus when learning about Database, it was easy for me to follow with the pace of the class. In contrast, since I did not learn Python logic and syntax in advance, it took me a lot of time to absorb the knowledge and apply to do the projects. Hence, from this experience, I will spend the upcoming semester break to prepare for the subjects that I enrolled for the Autumn 2023 term.
Personal Data Set
Overall, I am quite content with my personal finance management so far. I think that the minimalism lifestyle plays a vital role in my budget control. With that concept in mind, it is good for me to always ask myself twice before buying anything. Actually, I was not aware of the declining trend of my spending until doing this Quantified-Self Report. This Report has given me more data-informed insights of past events as well as clues of what I need to improve in the future.
However, as the little amount spent on transportation reveals, I seemed not to explore around Sydney enough. Although I am glad that I could save money on transport expenses, it is absolutely a pity for me not to enjoy one of the ten most livable cities in the world more. Indeed, I have had a bucket list of where to visit and what to eat in Sydney. With a better study plan and the upcoming long term break, hopefully I can tick all boxes soon.
In conclusion, this Quantified-Self Report not only helps me obtain more knowledge about data science, but also gain data-informed insights about different aspects of my life based on the collected data sets. In the hectic life of a new overseas student who has a lot to adapt, this Report indeed provides a precious opportunity to learn and practise new academic knowledge, and at the same time, to reflect myself with experiences to prepare for the journey ahead.
I will attach the data visualisation files in the submission.
if(knitr::is_html_output()) knitr::include_url("https://docs.google.com/document/d/e/2PACX-1vRX8O5Y4s3FKJNUVXSnacCqbVa3hCYbxszbgZEwqtSZs7UgtLY_SkxEStQ3LQhxrh3w1Yibur5Jlgbp/pub")10000steps (n.d.). Counting Your Steps. https://www.10000steps.org.au/articles/healthy-lifestyles/counting-steps/#:~:text=Studies%20using%20the%2010%2C000%20steps,activity%20toward%20achieving%20this%20goal
Health Direct. (2021, May). Drinking water and your health. https://www.healthdirect.gov.au/drinking-water-and-your-health
National Health Service. (2019, October 24). What should my daily intake of calories be?. https://www.nhs.uk/common-health-questions/food-and-diet/what-should-my-daily-intake-of-calories-be/#:~:text=An%20ideal%20daily%20intake%20of,women%20and%202%2C500%20for%20men
Sleep Health Foundation. (n.d.). HOW MUCH SLEEP DO YOU REALLY NEED?. https://www.sleephealthfoundation.org.au/pdfs/HowMuchSleep-0716.pdf