In this assignment we analyse the travel patterns and the daily journal of the team members. We compare the travel patterns of the team with the larger cohort, the people of NSW. The relationship between the daily travel (on foot and public transport) and the journal entry is also investigated to look for relating patterns.
The question that is being tried to answer is: Are there any significant patterns between travelling and our thoughts?
Image Credit: Destination NSW
There is a famous quote which makes sense “Traveling- it leaves you speechless, then turns you into a storyteller” - Ibn Battuta. That typically refers to the travel we do in our leisure time or out of routine. Does our routine travel and typical weekends also turn us into a storyteller? We would have answered that if we do have a story to tell.
There have been indications by previous studies that suggest the individuals mood or sentiment depends on the level of Physical activity. (Peluso and Andrade 2005) study indicates that moderate activity boosts the mood while extensive activity deteriorates it. Another study done (Slaven and Lee 1997) supports this claim as they found women in middle age to have improved moods with higher level of physical ex ercise.
Another good study in this area is “Mood and mode: does how we travel affect how we feel?”" by (Morris and Guerra 2015) which shows that the relationship is complicated and depends on several other variables.
The initial group meeting was used to discuss various ideas evaluating data points that could be gathered to tell a story. Some ideas were interesting like monitoring screen time and app usage patterns. This idea was discarded as there were already studies available that showed that higher screen times resulted in poor health and quality of life (Stiglic and Viner 2019). Other ideas were to monitor health and eating habits which was approved by the group. The more interesting idea of connecting 1 travel patterns and daily journal was selected while the food habits were chosen as a back as more information was gathered.
Everyone agreed to capture their public travel as their quantified data. Opal card information was used as the travel choice in NSW and everyone signed up on Opal for accessing their data. The journal entry was decided to be a once a day capture of brief snippet of the day and travel. Some people used an app initially while others recorded it elsewhere. Over the weeks everyone switched to using the - One Journal app for the convenience of data exportation. The backup data was the food, where in everyone took a picture of a meal, they had once a day. The personal data was left up to individuals. Personally, I used a pedometer app to log my step-count and distance data on a daily frequency as it made sense for a travel themed story.
| Data | Collection Level | Data Used | Measure | Capturing Method | Application Used | Analysis Performed |
|---|---|---|---|---|---|---|
| Travel | Group | Yes | Date, Time, Journey Detail, Amount, Full Fare, Discount | Card Tap at Journey start points | Opal Card | Monthly Opal Spending Analysis |
| Journal | Group | Yes | Date, Time, Location, Weather Condition, Weather temperature, Journal Text | Smart phone journal app | Day One | Text Analysis based on Travel Dates |
| Foot-Count | Individual | Yes | Step count, Distance travelled, Calories burnt | Enable app location tracking | Pedometer | Step count and Calories Burnt Analysis |
| Food | Group | No | Calorie , cuisine (to be extracted) | Top view Picture | Phone Camera | None |
| Opal Sydney Data | Cohort (Sydney) | Yes | Month, Spend | Independent Study | Opal data | Spend Analysis vs group vs Individual |
The preferred communication channel was a slack group - https://slack.com/ Everyone chipped in with thoughts, ideas and reminders on the slack group. The other channel was the DSI class where the group seating allowed for flow of conversations within the group. The data storage preference was Google Drive - https://www.google.com/drive/ The google drive group folder - Avocado was used to add the data files about the assignment. This included the cohort level folder, food folder, scripts for data extraction and the most importantly the team member individual data folder.
| Data | Issue or Challenge | Solution or Impact |
|---|---|---|
| Travel Data | Not Everyone Uses Public Transport for all their journeys. Some people use their personal transport like car or bike and some prefer private transport like taxis. | This should result in a variance of travel information between different members based on their dependency on public transport |
| Travel Data | The opal data can be extracted at a monthly frequency only via pdf file. The pdf to csv conversion results in loss of image format data - transport mode | The loss of transport mode results in lesser parameters for analysis. It also adds noise to the data which could have been explained well with this variable |
| Travel Data | Fare, Full Fare and Discount information isn’t always captured depending on the type of card used | For this analysis, the anomalous users would be excluded |
| Travel Data | The trip journey duration or time is not captured | This restricts key analysis points on journey distance or time. We may use other data source for such an analysis. |
| Journal | Not everyone remembers to update the journal every day. In some cases there are missing values and in some cases the entries are added in retrospect | Results in lack of continuous data points for a time series analysis which would have missing individual values |
| Journal | The One Journal app captures additional information like weather, location, wind and temperature but it works well only with iPhone users | Restricts the additional variable analysis to iPhone users |
| Journal | The additional information is also lost or misguided when the journal entries are added in either retrospect or from a personal journal into the app later | If the additional information seems anomalous, the assumption that it could be added in retrospect should be considered in analysis |
| Foot-Count | The assumption with tracking steps using an app is that you always have the phone charged with you when you are travelling on foot | There is loss of true step count which is always >= the tracked count |
| Foot-Count | The issue with collecting using the app is as it tracks the phone movement all the time, other apps like Batter Optimiser and OS performance try to stop it from running in the background Ans resets these settings on OS update or default app updates | The days when app doesn’t have tracking permission, it results in loss of information and missing values. We have used a smaller but continuous window for our analysis |
| Foot-Count | There is no data exporting option available in the pedometer app | For the lack of data exporting option, the entries were logged manually in a csv |
| Food | The food analysis of calories and was not integrating well with the travel theme. Also, initial testing of image processing of food data didn’t show good results | . The food collection was a back-up data set and so it was discontinued after un-promising results |
## member_id Transaction Date Time Mode Details
## 1 7 99 6/03/2019 17:07 NA Top up - Circular Quay, No.
## 2 7 140 14/03/2019 16:32 NA Top up - Newtown
## 3 2 92 9/04/2019 17:31 NA Top up - Opal Travel
## 4 2 2246 5/03/2019 9:04 NA Auto top up - North Sydney
## 5 2 2283 13/03/2019 9:32 NA Auto top up - North Sydney
## 6 3 586 9/04/2019 6:51 NA Auto top up - Central
## Journey Fare Full.fare Discount Amount
## 1 NA NA NA 50
## 2 NA NA NA 50
## 3 NA NA NA 40
## 4 NA NA NA 40
## 5 NA NA NA 40
## 6 NA NA NA 40
Above we can see a snippet of the day, also a summary of this data set. Now Let us analyse the missing values in our data set. We will have to limit our analysis based on NA or missing values. Here we can see that the parameter mode has no values. This was discussed in the data quality issues section. Also three other values - Discount, Full fare and journey seem to have some missing values. This may depend on the type of opal card. Hence, we will remove the field in Mode in our EDA section.
Let us first see a snippet of the data below and then analyse the missing values for this data. Here we can see a lot of weather based fields have missing values. This was explained in the data issues section, owing to non-iphone users and capturing journal entries other than the OneJournal app.
Below we can see the Step count data. Here we realise that the values are not missing but the data tracked only for a few weeks which will become clear in the Data Analysis Section.
## Date Step Calorie
## 1 1/04/2019 5205 217
## 2 2/04/2019 7236 304
## 3 3/04/2019 5616 234
## 4 4/04/2019 2709 119
In the First Chart we can see that the daily spend of every group member on Public Transport scattered over the roughly 2 months. Here we can see that the average spend on Public transport by the group is roughly below $5 daily. The average spend is also indicated by the blue line. Here we can also see that the trend is cyclic mainly owing to the weekday and weekend nature of travel.
In the Second Chart, the box plot we can see the Monthly amount spent by each member more clearly. Since the May month has incomplete data, the chart would be skewed towards the lower side. But we still get a fair idea of the average travel spending of each user. We can see that member 5 has the highest monthly average while user 7 has the highest spent in a month. User 6 seems to be an infrequent user. To analysie further we look at the bar charts
In the third chart there is clarity about the overall public transport usage by each member every month. Here we can see that members 4 and 5 are extensive users of Public Transport. User 7 while in the Month of March was the highest user, has become a very low activity user in May and June. This probably indicates a shift from Public use to private or a change of address(as confirmed by the user). While member 2 seems to be a very limited user.
Here we can see the Syndey Average Opal Use data (Pocketbook data) being compared to the group and the individual usage. It is evident that the group average, although around 20% off but is close to the Sydney Average opal spending. The group spend is higher in March add lesser in April. The individual spending is quite erratic and different from the city and group average. It is close to double the group’s average amount spent in March and about half of the city’s average amount spent in April. The data for May is excluded as the Month’s information is not complete.
In the Step count Analysis, we can see the sporadic pattern of usage. This indicates that I’m not following a consistent exercise schedule. The average step count on most days seems to be larger than the Australian average recorded at 4500 (Althoff, Hicks et al. 2017). This is confirmed by the box-plot showing the average at roughly 7000 steps.
The other graph shows the Calories burnt during the step count. This graph is synonymous with the step count graph. This highlights that the pedometer doesn’t take much into account the speed of travel and is largely basing the calorie information on the Distance.
In this chart we can see the pattern between the Individual vs the Group Average and the Sydney Average. The Sydney opal data consists of train travel only, so it is safe to assume that overall Public transport spending data maybe higher than average. Considering that, the group average is in par or lower (in April) than the Sydney average spent on transport. Individually, my travel data has become significantly lesser than both the groups.
Here we can see the words occurring in higher frequency(greater than thrice) in the personal journal data. This includes the entire group data and we can see how the theme is around travel. With references to train, traffic, commute. We can also observe some texts which highlight a positive connotation to the journal - “nice, sunny, enjoyable, great”. While there are texts which have a negative connotation as well - “traffic, bad, late”. The word summary gives a mixed sentiment with a travel-based theme int he journal texts.
This Analysis uses the personal journal entries from the days of higher step count. The measure used is calorie burnt > 250 per day as we saw earlier that the step count is in tandem with the calorie count. Here we can observe the texts for certain key words that indicate a behavioural pattern. The analysis is done by using multiple charts(with varying seed values) to pick out on the lesser frequent keywords as well as the data set very narrow. By this we could observe certain Positive keywords in the text - “Pretty, Lovely, Good, Amazing, Great, Mesmerising, Perfect”. There were also a few negative keywords observed - “stuck, chilly”. We see from this analysis that the sentiment was quite largely positive on the days where the step count is higher.
Conversely, this chart tells us about the text logs when the step count is low. Again, multiple seed values were used to pick out sentimental keywords which may not be all evident in this chart. Here some of the sentimental keywords that were observed were “complains, tiring, dragging, sick, happy, great”. In this case there is a mix of positive and negative connotations and hard to conclude the overall sentiment.
The report analysed the travel pattern of the group as well as the individual. The Public travel pattern of the group had mixed pattern among individuals. There were some heavy users and some light users. The personal data of public transport was quite unique(member 7) as it transitioned from heaviest user to low pointing to a key event - in my case, change of address.
The Personal Step count data was sporadic. It was at a higher than country average (Althoff, Hicks et al. 2017) on a monthly basis but had many days with lower than that.
The Journal data had the unquantified field - text. The high frequency words analysis on this showed a mixed sentiment of people. The other fields from journal were discounted for being iPhone user specific and quantified.
The interesting part of the study was the sentimental analysis on days with higher step count vs lower step count. Previous Studies have indicated the positive effects of physical activity on people’s mood(Peluso and Andrade 2005). This Analysis, although at an individual scale of data for short interval, is in synchronous with such studies. The days with higher physical activity recorded showed largely positive sentiment as compared to the one with less.
The group and individual journal data could be analysed against the city-wide commuter satisfaction reports to look for patterns between the different groups. There could be further analysis of journal data based on day of the week or days with high or low public travel.
The Step count data could also be compared against the wider audience like gender, age group, locality to see a pattern of individual data against the larger audience.
With the growing reliance on computer-based applications in daily lives, there also arise some moral questions about such practices. The dependence on such algorithms poses issue of fairness and human rights including privacy (Floridi and Taddeo 2016).
In order to use this data for analysis, we must respect the user’s privacy of personal information and anonymise the data. This is done by using a member id instead of an individual name. There are also pseudonyms shared among group members for the purpose of name-based analysis. The analysis is also done in an ethical way by keeping the data in a private folder in cloud, accessible to the group. It is known that the geo-localized history of an individual could also act as a quasi-identifier, which in turn could be used to access their personal and sensitised information(Bettini, Wang et al. 2005).
In this project a lot of personalised data about users has been collected, including travel and journal logs. If the meta-data patterns are analysed it could lead to identification of things like Home location, office to work locations for individuals and it’s time patterns. This sensitive information must be handled according to the Principles laid down under Australian Privacy Act 1988 (privacy act 1988). The consent of everyone involved for the information was essential, the data must be used only for the purpose that it was collected for and stored with privacy and security.
This project has provided me with good insights into my travel-based patterns. The key observation that higher physical activity results in better behavioural sentiment is a great motivation to keep the higher level of activity. The evident sporadic pattern of step count could be replaced with a consistent to at least above average daily routine of activity.
The highest travel spending data in a month indicated the need to improve upon better travel management. This would indicate a high amount a time spent in public travel. Thankfully this has been corrected so I must cherish the gain in spare time and put it to good use.
Another pattern observed was the differences in data collection practices. In the current age there are several tools and technologies available to solve the same problems. In our case even when we chose the same application to collect data, the differences in meta data could be observed in Android vs iOS devices used. The other pattern was the way we interact with the same technology creates differences in data. For example, while we used the Onejournal app ultimately record journal data, some members used it consistently while some updated the entries on one or two days based on their handwritten or notes logged elsewhere. This created false information in the journal location metadata for entries updated in bulk which were then neglected from analysis. Hence, while we observe a consistency in the data collection tools, there should also be discussions around the interaction pattern with the tools and the implications of it for customised versions of it, like iOS app vs android app.
Bettini, C., X. S. Wang and S. Jajodia (2005). Protecting privacy against location-based personal identification. Workshop on Secure Data Management, Springer.
Stiglic, N. and R. M. Viner (2019). “Effects of screentime on the health and well-being of children and adolescents: a systematic review of reviews.” BMJ open 9(1): e023191.
Althoff, T., J. L. Hicks, A. C. King, S. L. Delp and J. Leskovec (2017). “Large-scale physical activity data reveal worldwide activity inequality.” Nature 547(7663): 336.
Peluso, M. A. M. and L. H. S. G. d. Andrade (2005). “Physical activity and mental health: the association between exercise and mood.” Clinics 60(1): 61-70.
Floridi, L. and M. Taddeo (2016). What is data ethics?, The Royal Society.
Slaven, L. and C. Lee (1997). “Mood and symptom reporting among middle-aged women: The relationship between menopausal status, hormone replacement therapy, and exercise participation.” Health Psychology 16(3): 203.
https://getpocketbook.com/blog/does-the-opal-card-actually-save-you-money/