Project

STAT 231: Calendar Query

Author

Tiffany Sun

Published

March 15, 2025

Introduction

How do I spend my day?

This calendar query project will address the overarching question by uncovering two key trends - my sleep and activities trends. My first objective is to determine the average time I allocate to engaging with 9 types of activities: classes, entertainment, extracurriculars, homework, meals, meetings, naps, personal work, and social hangouts. My objective goal is to explore my sleep trends; figuring out the average amount of sleep I get or how tired I feel each day will not only help me understand my sleeping patterns, but also help me understand what my ideal amount of sleep is. Through these objectives, the calendar query project will empower me to understand myself better and can be a helpful tool in informing me how I might restructure my day-to-day lifestyle going forward.

General Questions of Interest

What do I do on a daily basis?
What is the average amount of sleep I get?
What is the ideal amount of sleep I should get to function optimally?

Methods

Data collection

The data collection process will use Google Calendar for documenting sleep and activity trends. Activities are categorized into nine types described below. Start and end times will be recorded before and after the activity.

Classes - Courses I am taking this semester.
Entertainment - Engagement with digital social platforms, movies, and games.
Extracurriculars - Involvement in student organizations.
Homework - Homework or study sessions for exams.
Meals - Breakfast, Lunch, and Dinner.
Meetings - Interviews, Office Hours, and Alumni Calls.
Naps
Personal Work - Work that is not academic (i.e job applications and chores).
Social Hangouts - Time when I am with my friends.

Sleep trends will be documented quantitatively and qualitatively. Start and end times will be tracked before and after sleep respectively. At the end of each day, tiredness levels will be rated on a scale of 1-4:

1 → Fully alert: Did not feel sleepy nor doze off and had no difficulty staying focused.

2 → Mild Tired: Occasionally felt fatigued or unfocused, possibly spaced out, but did not actually fall asleep.

3 → Moderately Tired: Struggled to stay awake; experienced micro-sleep (brief involuntary sleep episodes) at most once.

4 → Severely Tired: Frequently struggled to stay awake; micro-slept multiple times and had to take a nap during the day.

The data collection will finish by transferring the data into R with with an .ics file.

Data wrangling

The dataset, ‘mycal’, is created from the imported dataset to track all activities from February 5th, 2025 to February 24th, 2025, which include their calculated duration hours, start and end times, date, description, and type. Columns containing the UID of the activities, last modification date, month and year numbers were excluded as they were unnecessary for the data analysis. The input in the ‘description’ and ‘activity_type’ columns transformed to all lower-case and were properly formatted for consistency, making future data analysis and visualization more convenient and efficient.

# Data import (requires **ical** package)
cal_import <- ical_parse_df("project_data/tiffany_calendar.ics") 

# Data wrangling
mycal <- cal_import |>
  # Google Calendar event names are in a variable called "summary";
  # "activity_type" is a more relevant/informative variable name.
  rename(activity_type = summary) |>
  mutate(
    # Specify time zone (defaults to UTC otherwise)
    across(c(start, end), 
           .fns = with_tz, 
           tzone = "America/New_York"),
    # Compute duration of each activity in hours
    duration_hours = interval(start, end) / hours(1),
    date = date(start),
    year = year(start),
    month_number = month(start),
    # Convert text to lowercase and remove repeated/leading/trailing 
    # spaces to help clean up inconsistent formatting.
    across(c(activity_type, description), 
           .fns = str_to_lower),
    across(c(activity_type, description), 
           .fns = str_squish)
  ) |>
  # Removing unnecessary columns. 
  select(-uid, -last.modified, -status, -month_number, -year)

A smaller dataset ‘mycal_sleep’ was created for sleep trends. The variable ‘description’ was renamed to ‘tiredness_level’ for better readability, followed by recoding the values into the four levels mentioned in Methods.

# Creating a filtered dataset for Sleep Trends
mycal_sleep <- mycal |>
  filter(activity_type == "sleep") |>
  rename(tiredness_level = description) 

#Renaming and factoring levels of tiredness_level
mycal_sleep$tiredness_level <-
  mycal_sleep$tiredness_level |>
  #[1]
  factor(levels = c(1, 2, 3, 4),
         labels = c("Fully Alert", "Mildly Tired", "Moderately Tired", 
                    "Severely Tired"))

Results

Summary Table

Table 1 below is a summary table that represents the average, minimum, maximum and median hours of each activity type. The shortest activity type was meetings, which its average duration was approximately 0.77 hours. The longest activity type was sleep, which its average duration was approximately 6.31 hours. On average, I would spend around 1-2 hours for most of the activities.

 mycal |>
  group_by(activity_type) |>
  # Calculating average, min, max, and median of duration hours rounded to 
  # the nearest hundredth
  summarize(average_hours = round(mean(duration_hours), digits = 2), 
            min_hours = round(min(duration_hours), digits = 2), 
            max_hours = round(max(duration_hours), digits = 2), 
            median_hours = round(median(duration_hours), digits = 2)) |>
  # Arranging activity types by shortest to longest duration time
  arrange(average_hours) |>
  # Labels and formatting
  kable(caption = "Summary of All Activity Types",
        col.names = c("Activity Type", "Average Hours", "Minimum Hours", 
                      "Maximum Hours","Median Hours")) |>
   kable_minimal()

Summary of All Activity Types
Activity Type	Average Hours	Minimum Hours	Maximum Hours	Median Hours
meeting	0.77	0.17	2.00	0.50
meal	1.03	0.50	1.50	1.00
classes	1.31	0.83	3.00	1.33
nap	1.43	0.50	4.00	1.00
personal work	1.43	0.33	4.42	0.88
social hangout	1.85	0.50	5.00	1.50
homework	1.86	0.50	9.50	1.25
extracurricular	2.22	1.00	4.00	2.00
entertainment	2.25	0.75	6.25	2.00
sleep	6.31	4.92	11.83	6.08

Figure 1: Activities Trend

Figure 1 is a segmented bar chart illustrating how I spend my days from February 5th to February 24th. My days, meaning that the time I wasn’t sleeping, typically range from 15 to 20 hours. During the first five days, entertainment seems to take up 50% of my days. During the weekends, especially on February 15th and February 22nd, social hangout seems to take up a significant portion of my days. Most of my days are characterized by entertainment, classes, homework, meals, and naps.

mycal |> 
  filter(activity_type != "sleep") |>
  # Description is specifically for sleep
  select(-description) |>
  # Segmented bar chart of all days, bars differentiated by color in 
  # a qualitative color palette
  ggplot(aes(fill = activity_type, y = duration_hours, x = date)) +
    geom_bar(position = "stack", stat = "identity") +
    labs(
      title = "Activities Recorded Daily",
      subtitle = "February 5 - February 24", 
      x = "Date", 
      y = "Cumulative Hours",
      caption = "Figure 1"
    ) +
    # Legend title 
    guides(fill = guide_legend(title = "Activity Type")) +
    theme_minimal()

Figure 2: Sleep Trends

Figure 2 displays a line graph that tracks the hours of sleep across time. It uses a sequential color palette to describe the the level of tiredness of each day. The graph demonstrates that I typically sleep within 5 to 6.5 hours. However, there was an outlier in which I slept approximately 12 hours, which is likely because that day was on a weekend. Overall, the line plot shows I was severely tired on most days when I slept between 5 to 7 hours. On the days that I felt fully alert, I slept over 6 hours. While this observation may indicate that I would function optimally when I have slept for at least 6 hours, there is not enough data to support this.

mycal_sleep |>
ggplot(aes(x = date, y = duration_hours, color = tiredness_level)) +
  geom_point() +
  geom_line(color = "gray") +
  # Sets y labels to intervals of 1
  scale_y_continuous(breaks = breaks_width(1)) + 
  labs(
    title = "Sleep Trends",
    subtitle = "February 5 - February 24",
    x = "Date", 
    y = "Hours of Sleep", 
    color = "Tiredness Level",
    caption = "Figure 2"
  ) +
  # Blue-purple sequential color scheme representing tiredness levels
  scale_color_brewer(type = "seq", palette = "BuPu") +
  theme_dark()

Conclusions

Overall, I learned my days are quite long; I am generally awake for 15-20 hours of the 24 hours each day has. A typical day is a mix of classes, entertainment, homework, meals, social hangouts, and naps; it is quite balanced. Regardless of how long my days are, I would sleep around 5 to 6.5 hours, which is insufficient since most of my days were characterized as me being severely tired. February 16 was an unusual day in both figures as my day was not only shorter than normal, but I slept almost twice the typical amount. Going forward, I might spend my time trying to get more rest. I had to take a nap almost everyday, which is a major indicator that I need more sleep. While it is unclear what might be the ideal amount of sleep I should get, figure 2 indicates that the days when I felt fully alert were days when I had over 6 hours of sleep. As such, I might try to get more than 6 hours of sleep.

Project Reflection

During the calendar query project, I faced several difficulties in collecting data through Google Calendar. One of my main hurdles in gathering accurate data is not knowing why some of the documented activities were absent in the ics. file. After a series of trial and error in Google Calendar, I realized that if the repeated function for event was toggled on, that and the following events would not show up in the ics. file. As such, I had to go through the calendar and redocument all the entries. Another main hurdle in gathering accurate data is not knowing how to categorize several of the documented activities, since they may be a mix of two categories. Take a homework activity as an example; I may be doing homework, but I could be multitasking homework with watching a movie or show. When I came across these situations, I relied on my intuition to determine which I was making more progress in (watching the movie or doing homework) and then categorizing the activity based on the type I made more progress in. Possible implications that may affect future data collection and analysis projects may be inaccurate or misleading data analysis and visualization. Missing data can significantly alter one’s interpretation of the results and trends, which is why collecting data accurately and fully is necessary. Additionally, not categorizing or grouping things properly can also lead to less accurate results and interpretations.

A month of data collection would be sufficient to answer my questions of interest. It wouldn’t be hard to collect that data as long as I am consistent in proactively documenting activity and sleep times. I lack data for tiredness levels excluding ‘severely tired’, which can be fulfilled if I had more time in data collection. As someone who provides data, I expect the receiver to keep my data confidential and only share data when given permission to. As someone who analyzes other’s data, I have the ethical responsibility to not disclose information since they can be highly confidential. Additionally, I am responsible for being objective and truthful in analyzing data; this means no manipulating data to significantly alter the results and interpretations.

Sources

[1]. R Factors, https://www.geeksforgeeks.org/r-factors/