Introduction

Starting education in college can be both exciting and stressful due to the number of new people around, hard classes, and novel activities. In this report, I will analyze how my time spent on extracurricular activities has changed throughout the second semester of high school and college, whether there was correlation between the number of hours spent exercising and productivity, and which day of the week was the most efficient for studying. These insights are valuable in planning the workload and optimizing schedules to achieve the highest productivity in college, based on the experience I have collected throughout this time.

Methods

Data collection

Every week I plan my schedule ahead of time by adding lectures, club meetings, or sport sessions to my Google Calendar. In order to track the rest fo my time spent on sports or learning, that is not accounted for with initial scheduling, I will adjust the times in my calendar as needed, the same way I used to do it in high school. Studying consists of lectures, homework assignments, study sessions, and work. I consider sports everything that related to physical exercise, such as running, yoga, hiking, gym. My extracurriculars are activities that I do in my free time, such as club meetings, hall events, online courses, and internship search.

Data wrangling

I started by splitting the data by date to make two separate datasets consisting of observations over a 3-week period in high school and college based on date of each activity. To answer each question of interest, I filtered activities and put them in respective categories: sports, extracurriculars, and studying by searching for common sequence of characters, such as h/w, running, figure skating. I created a separate column for categorizing each activity as sports, studies, or hobbies, while preserving the original name of each observation. Since I studied in Slovak high school, I changed the timezone of my activities to account for a different time respectively. To answer my questions, I summarized the number of hours spent doing homework and engaging in extracurricular activities by day. I will use this data to plot a side by side histogram. To answer the question on correlation of sports and productivity, I will create a scatter plot. I extracted data of interest based on the type of activity according to each research question and put it in separate dataframes to make it easily accessible.

cal_import <- ical_parse_df("Data/calendar_Koval.ics")

# Data wrangling
mycal <- cal_import |>
  rename(activity = summary) |>
  mutate(
    across(c(start, end), 
           .fns = with_tz, 
           tzone = "America/New_York"),
    # Compute duration of each activity in hours
    duration_hours = interval(start, end) / hours(1),

    date = date(start),
    weekday_label = wday(start, 
                         label = TRUE, 
                         abbr = FALSE),
   
    across(c(activity, description), 
           .fns = str_to_lower),
    across(c(activity, description), 
           .fns = str_squish)
  )

#split the data into two dataframes
high_school <- mycal|> 
  filter(between(date, as.Date('2024-01-27'), as.Date('2024-02-21')))
college <- mycal|> 
  filter(between(date, as.Date('2025-01-27'), as.Date('2025-02-21')))

#change timezone to Europe
high_school <- high_school |>
  mutate(
    across(c(start, end), 
           .fns = with_tz, 
           tzone = "Europe/Bratislava")
  )

#grouping variables by category (studies)

high_school <- high_school |>
  mutate(category = if_else(
    activity == "calculus" | 
    activity == "character seminar 4" | 
    activity == "entrepreneurial leadership" | 
    grepl("homework|h/w|ap|slovak|hw", activity), "studies",
    
    if_else(
      activity == "yoga" | 
      activity == "running" | 
      activity == "gym", 
      "sport",  
      "extracurricular"
    )
  ))

college <- college |>
  mutate(category = if_else(
    activity == "figure skating" | 
    activity == "running" | 
    activity == "gym" | 
    activity == "hiking", "sport", 
    
    if_else(
      grepl("homework|h/w|science|stats|econ|sds|work|statistics", activity), 
      "studies", 
      "extracurricular"
    )
  ))

#Renaming observations with similar meaning

college <- college |>
  mutate(activity = if_else(
    grepl("e class|e-class|english", activity), "English classes",
    if_else(
      grepl("mun|model|position", activity), "Model United Nations",
       if_else(
      grepl("internship|lecture|position|sas|alumni|data|study away|resume", 
            activity), "Internships",
      if_else(
      grepl("cise|tea|house", activity), "Amherst events",
      activity)))))

high_school <- high_school |>
  mutate(activity = if_else(
    grepl("e class|e-class|english|yes|алена", activity), "English classes",
    if_else(
      grepl("assembly|advisory|hall", activity), "Hall events",
      if_else(
      grepl("mep", activity), "Model European Parliament",
    activity))))

#Preparing data to graph extracurricular time by semester

extracurricular_time <- bind_rows(high_school, college)|>
  filter(category == "extracurricular") |>
  select(date, activity, duration_hours, weekday_label) |> 
  mutate(
    semester = if_else(date >= "2024-01-27" & date <= "2024-02-20", 
                       "High School Second Semester", 
                       "College Second Semester")) |>
  group_by(date, semester, activity, weekday_label) |>
  summarise(duration_hours = sum(duration_hours), .groups = "drop")
  
#Summing the duration of each grouped activity by day

combined_data <- extracurricular_time |>
  group_by(weekday_label, activity, semester) |>
  summarise(total_duration = sum(duration_hours), .groups = "drop")

Results

Extracurricular Activities in High School and College

Based on the visualizations of time spent on extracurricular activities in high school and college, I can make a few observations. I spent 40 hours on hobbies in total during the second semester of college, and 36.75 while in high school. The histograms suggest that in college I had 5 different activities outside of classes, while in high school I had only 3 different interests. Notably, I spent more time on activities organized by my high school, rather than on Amherst college events. I got more engaged in Model United Nations in college, than in Model European Parliament in high school.

ggplot(combined_data, aes(x = weekday_label, y = total_duration, 
                          fill = activity)) +
  geom_bar(stat = "identity", color = "black", alpha = 0.8) +
  facet_wrap(~semester) +  
  labs(title = "Extracurricular Activities by Day of the Week",
       x = "Day of the Week", y = "Total Hours",
       fill = "Activity") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

#Nummerical summary of total time spent on extracurricular activities
total_hours <- extracurricular_time |>
  group_by(semester) |>
  summarise(total_duration = sum(duration_hours), .groups = "drop")

print(total_hours)

Correlation of Sports and Productivity

There appears to be very weakly negative to no relationship between the duration of sports sessions and the number of hours spent studying, because the trendline has a slope close to 0. One outlier on the right side of the graph represents sports session of 6 hours. It is associated with no study time, which can suggest the influence of exhausting physical activity on overall efficiency, but more data is needed to prove this hypothesis.

ggplot(sports_vs_productivity, aes(x = sport_duration, 
                                   y = study_duration, color = activity)) +
  geom_point(color = "blue", alpha = 0.6) +  # Scatter plot for points
  geom_smooth(method = "lm", color = "red") +  # Adding a regression line
  labs(title = "Correlation Between Sports and Study Duration",
       x = "Total Hours Spent on Sports",
       y = "Total Hours Spent on Studies",
       color = "Sport Type") +
  theme_minimal()

### Summary Table

Summary table suggests the the mean duration of time spent on extracurricular activities per day increased from 1.05 in high school to 1.43 hours in college. The average amount of time spent studying per day increased from 1.42 to 1.7 hours. It is more evident from the table, that the amount of studying is much lower on days when I exercised than on those when I didn’t exercise. For instance, in college I studied on average 0.6 hours more on non-sport days, while in high school the difference in study time on sport and non-sport days was 0.37 hours. Thursday was my hardest day in terms of academic workload in high school, but in college it is now Saturday, so I tend to allocate a lot of time to finish my assignments on the weekends.

#Combining table data with values of interest(mean values)
summary_stats <- bind_rows(
  
  college |> filter(category == "extracurricular") |> 
    summarise(activity = "Extracurriculars", 
              avg_duration = mean(duration_hours), 
              semester = "College Second Semester"),
  
  high_school |> filter(category == "extracurricular") |> 
    summarise(activity = "Extracurriculars", 
              avg_duration = mean(duration_hours), 
              semester = "High School Second Semester"),
  
  college |> filter(category == "studies") |> 
    summarise(activity = "Studies", 
              avg_duration = mean(duration_hours), 
              semester = "College Second Semester"),
  
  high_school |> filter(category == "studies") |> 
    summarise(activity = "Studies", 
              avg_duration = mean(duration_hours), 
              semester = "High School Second Semester"),
  
  sports_vs_productivity |> 
    mutate(semester = if_else(date >= "2024-01-27" & date <= "2024-02-20", 
                              "High School Second Semester", 
                              "College Second Semester")) |> 
    group_by(semester, sport_day = sport_duration > 0) |> 
    summarise(avg_duration = mean(study_duration), .groups = "drop") |> 
    mutate(activity = if_else(sport_day, "Studies (Sports Days)", 
                              "Studies (Non-Sports Days)")) |> 
    select(-sport_day)
)

#Rearranging the data to divide by college and high school
summary_table <- summary_stats |> 
  pivot_wider(
    names_from = semester,
    values_from = avg_duration
  ) |> 
  rename(
    "Average Time Spent On" = activity,
  )

#Adding values for  table formatting
summary_table <- data.frame(
  Activity = c("Extracurriculars", "Studies",
               "Studies (Non-Sports Days)",
               "Studies (Sports Days)", 
               "Day with longest mean study time"),
  College_Second_Semester = c(1.43, 1.70, 3.13, 2.53, "Saturday"),
  High_School_Second_Semester = c(1.05, 1.42, 2.25, 1.88, "Thursday")
)

kable(summary_table, booktabs = TRUE, align = "c",
      col.names = c("Activity", "College", "High School")) |>
      kable_styling(latex_options = c("striped")) |>
      column_spec(1, bold = TRUE) |>
    row_spec(0, bold = TRUE, color = "white", background = "gray")

Activity	College	High School
Extracurriculars	1.43	1.05
Studies	1.7	1.42
Studies (Non-Sports Days)	3.13	2.25
Studies (Sports Days)	2.53	1.88
Day with longest mean study time	Saturday	Thursday

Conclusions

This study was essential for me to understand how my time management habits, interests, and workload changed throughout the time I spent in high school and college. I found out that the average duration of my extracurricular activities in college is longer, and I now tend to have a better variety of activities. These realizations inspired me to keep diversifying hobbies and try new experiences despite academic work. Saturday is the busiest day for studying, which means I should reallocate some study sessions proportionally throughout the week to get enough rest on the weekends. Initially, I hypothesized that exercise increases energy and leads to more productive work, but my weren’t supported by the table and graph which showed some negative correlation in academic performance related to sports. Therefore, more data is needed to confirm this result.

Reflection

Collecting data is an organized and structured process. It was important to plan ahead, identify the main variables of interest and track them efficiently. Since I wanted to compare my schedule across high school and college, I needed to account for the fact that even though I used calendar consistently, my data wasn’t collected as properly while I was in high school. To handle this obstacle, I focused on the variables that I had the most data and control over the duration of both semesters. Second challenge that I faced was accurately tracking data. For example, I was intentional with my study time and minimized the number of distractions to collect data precisely. However, it is not always the case in real life, because I can spend my time on other things, such as scrolling on social media and talking to friends without even noticing it. In the future, I will make assumptions about such inconsistencies ahead of time and define a way to track detailed information if needed for my future project. To do that, I can additionally use my screen time data or simply log any distractions that arise without trying to minimize them to replicate the real-world setting.

Data collection process depends on my future project of interest. For instance, selecting variables of interest for my Calendar Query project was key to not lose time collecting unessential data, so I will keep putting a lot of effort into preliminary planning. I consider using data from other sources, such as apps. Strava app has my exercising data, which I can combine with health metrics data collected through my Smart Watch, or with health data from Flo. I consider this information valuable because it allows me to keep track of target information, doesn’t take a lot of time, and provides details on multiple levels.

While using apps that collect data, I expect it to remain confidential and private. Therefore, I hope that nobody gets access to my personal information or buys access to it. I follow the same principles when analyzing someone else’s data. I would first make sure I am allowed to study the provided data based on legal and ethical grounds. If it is allowed, I will make sure I don’t draw any wrong conclusions that can jeopardize someone’s reputation, and I will not use sensitive data that should remain confidential. I am committed to ensuring high-quality ethical analysis that accounts for different perspectives, remains unbiased and can be replicated. Such approach is crucial to improving my technical expertise, delivering clear answers, and helping people in various industries.

Sources

Source - Table formatting.

Source - grepl function to look for common sequence of characters.

Comparing Productivity Across Semesters

Calendar Query

Mariia Koval