Select a column of your data that encodes time (e.g., “date”, “timestamp”, “year”, etc.). Convert this into a Date in R.

Note, you may need to use some combination of as.Date, or to_datetime. And, you may even need to paste year, month, day, hour, etc. together using paste (even if you need to make up a month, like “__/01/01”).

Our data dive this week focused time series modoling using garment worker productivity dataset. The goal was to begin preparing the data for time series analysis by working with the date column and understanding how productivity changes over time. Since the dataset includes daily records, it provides a strong foundation for exploring trends, patterns, and seasonal effects. However, daily data can show short-term fluctuations that make it harder to see broader patterns. For this reason, more attention was given to weekly, monthly, and quarterly groupings, which help smooth out noise and reveal consistent trends and seasonal behaviors over time.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:zoo':
## 
##     index
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
theme_set(theme_minimal())
data <- read.csv("C:/Users/rbada/Downloads/productivity+prediction+of+garment+employees/garments_worker_productivity.csv")
# Convert date column safely
library(lubridate)

data <- data %>%
  mutate(date = mdy(trimws(date))) %>%  
  filter(!is.na(date))                  

summary(data$date)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "2015-01-01" "2015-01-18" "2015-02-03" "2015-02-04" "2015-02-23" "2015-03-11"
cleaned_data <- data %>%
  group_by(date) %>%
  summarise(avg_productivity = mean(actual_productivity, na.rm = TRUE)) %>%
  complete(date = seq(min(date, na.rm = TRUE), max(date, na.rm = TRUE), by = "day"))

head(cleaned_data)
## # A tibble: 6 × 2
##   date       avg_productivity
##   <date>                <dbl>
## 1 2015-01-01            0.719
## 2 2015-01-02           NA    
## 3 2015-01-03            0.803
## 4 2015-01-04            0.780
## 5 2015-01-05            0.796
## 6 2015-01-06            0.806
nrow(cleaned_data)
## [1] 70
summary(cleaned_data)
##       date            avg_productivity
##  Min.   :2015-01-01   Min.   :0.6341  
##  1st Qu.:2015-01-18   1st Qu.:0.6996  
##  Median :2015-02-04   Median :0.7316  
##  Mean   :2015-02-04   Mean   :0.7343  
##  3rd Qu.:2015-02-21   3rd Qu.:0.7723  
##  Max.   :2015-03-11   Max.   :0.8550  
##                       NA's   :11
weekly_productivity <- cleaned_data %>%
  mutate(week = floor_date(date, unit = "week")) %>%
  group_by(week) %>%
  summarise(avg_productivity = mean(avg_productivity, na.rm = TRUE))

head(weekly_productivity)
## # A tibble: 6 × 2
##   week       avg_productivity
##   <date>                <dbl>
## 1 2014-12-28            0.761
## 2 2015-01-04            0.780
## 3 2015-01-11            0.742
## 4 2015-01-18            0.705
## 5 2015-01-25            0.785
## 6 2015-02-01            0.771
nrow(weekly_productivity)
## [1] 11
summary(weekly_productivity)
##       week            avg_productivity
##  Min.   :2014-12-28   Min.   :0.6701  
##  1st Qu.:2015-01-14   1st Qu.:0.7057  
##  Median :2015-02-01   Median :0.7408  
##  Mean   :2015-02-01   Mean   :0.7354  
##  3rd Qu.:2015-02-18   3rd Qu.:0.7657  
##  Max.   :2015-03-08   Max.   :0.7853
monthly_productivity <- cleaned_data %>%
  mutate(month = floor_date(date, unit = "month")) %>%
  group_by(month) %>%
  summarise(avg_productivity = mean(avg_productivity, na.rm = TRUE))

head(monthly_productivity)
## # A tibble: 3 × 2
##   month      avg_productivity
##   <date>                <dbl>
## 1 2015-01-01            0.754
## 2 2015-02-01            0.719
## 3 2015-03-01            0.718
nrow(monthly_productivity)
## [1] 3
summary(monthly_productivity)
##      month            avg_productivity
##  Min.   :2015-01-01   Min.   :0.7185  
##  1st Qu.:2015-01-16   1st Qu.:0.7188  
##  Median :2015-02-01   Median :0.7192  
##  Mean   :2015-01-31   Mean   :0.7305  
##  3rd Qu.:2015-02-15   3rd Qu.:0.7365  
##  Max.   :2015-03-01   Max.   :0.7538
quarterly_productivity <- cleaned_data %>%
  mutate(quarter = floor_date(date, unit = "quarter")) %>%
  group_by(quarter) %>%
  summarise(avg_productivity = mean(avg_productivity, na.rm = TRUE))

head(quarterly_productivity)
## # A tibble: 1 × 2
##   quarter    avg_productivity
##   <date>                <dbl>
## 1 2015-01-01            0.734
nrow(quarterly_productivity)
## [1] 1
summary(quarterly_productivity)
##     quarter           avg_productivity
##  Min.   :2015-01-01   Min.   :0.7343  
##  1st Qu.:2015-01-01   1st Qu.:0.7343  
##  Median :2015-01-01   Median :0.7343  
##  Mean   :2015-01-01   Mean   :0.7343  
##  3rd Qu.:2015-01-01   3rd Qu.:0.7343  
##  Max.   :2015-01-01   Max.   :0.7343

This week’s work focused on transforming the garment worker productivity dataset into a time series format to explore trends at multiple time scales. The process began by converting the original character-formatted date column (e.g., “1/1/2015”) into a proper Date object using lubridate::mdy(), which enabled accurate time-based grouping. Average productivity was then calculated for daily, weekly, monthly, and quarterly levels using group_by() and summarise(). To create a continuous time series, missing dates were filled using tidyr::complete() for each level of aggregation. This ensured the inclusion of all time points within the dataset’s range, with NA values used for any missing productivity records.

Create a tsibble object of just the date and response variable. Then, plot your data over time. Consider different windows of time.

What stands out immediately?

productivity_ts <- cleaned_data %>%
  as_tsibble(index = date)
productivity_ts <- cleaned_data %>%
  as_tsibble(index = date)
ggplot(productivity_ts, aes(x = date, y = avg_productivity)) +
  geom_line(color = "steelblue", linewidth = 1) +
  labs(title = "Daily Productivity Over Time (TSibble)",
       x = "Date", y = "Average Productivity") +
  theme_minimal()

The plot shows daily changes in worker productivity across the time period. There are noticeable ups and downs, with a few sharp dips in mid-January and February. A short-term increase appears in early February before productivity drops again. The overall pattern is irregular, and some missing values (gaps) are visible, suggesting days without recorded data. This irregularity stands out immediately and shows the need to explore trends at weekly or monthly levels for a clearer view.

weekly_ts <- weekly_productivity %>%
  as_tsibble(index = week)

ggplot(weekly_ts, aes(x = week, y = avg_productivity)) +
  geom_line(color = "darkgreen", linewidth = 1) +
  labs(
    title = "Weekly Average Productivity Over Time",
    x = "Week",
    y = "Average Productivity"
  ) +
  theme_minimal()

The weekly productivity plot shows clear ups and downs throughout the observed period. Productivity peaked around early February, dropped significantly in mid-February, and began to rise again in March. This smoothed view helps highlight broader trends that may be missed in daily data and can support better planning or scheduling decisions based on weekly patterns.

monthly_ts <- monthly_productivity %>%
  as_tsibble(index = month)

ggplot(monthly_ts, aes(x = month, y = avg_productivity)) +
  geom_line(color = "tomato", linewidth = 1) +
  labs(
    title = "Monthly Average Productivity Over Time",
    x = "Month",
    y = "Average Productivity"
  ) +
  theme_minimal()

The monthly productivity plot shows a steady decline from January through March. Productivity was highest in January but dropped in February and remained low into early March. This trend suggests a possible slowdown in performance, which may reflect seasonal effects, workload shifts, or operational challenges. Monthly aggregation smooths short-term variation and is useful for observing overall performance trends across longer periods.

quarterly_ts <- quarterly_productivity %>%
  as_tsibble(index = quarter)
ggplot(quarterly_ts, aes(x = quarter, y = avg_productivity)) +
  geom_point(color = "blue", size = 3) +
  labs(
    title = "Quarterly Average Productivity (Q1 only)",
    x = "Quarter",
    y = "Average Productivity"
  ) +
  theme_minimal()

The quarterly plot shows a single data point representing productivity for Q1 2015, with an average around 0.74. Since the dataset only includes data from January through early March, no other quarters are available. This view gives a general summary of overall productivity for that period, but more quarters would be needed to observe long-term changes or seasonal trends. Daily, weekly, monthly, and quarterly productivity were compared to understand how performance changes over time. Daily data captured detailed changes but was often noisy and harder to interpret. Weekly averages provided a clearer view of short-term trends while reducing daily fluctuations. Monthly summaries helped highlight longer-term patterns and are useful for reporting. Quarterly data included only one value in this case, so it was not helpful for deeper analysis. Overall, weekly aggregation offered the best balance between detail and clarity. It allowed for easier interpretation of productivity trends and supported better decision-making.

Use linear regression to detect any upwards or downwards trends.

Use smoothing to detect at least one season in your data, and interpret your results.

Can you illustrate the seasonality using ACF or PACF?

clean_data <- productivity_ts %>%
  filter(is.finite(avg_productivity) & !is.na(date))
ggplot(clean_data, aes(x = date, y = avg_productivity)) +
  geom_point(size = 1, color = "gray") +
  geom_smooth(method = "loess", span = 0.2, se = FALSE, color = "blue") +
  labs(
    title = "Smoothed Daily Productivity (LO(W)ESS)",
    x = "Date",
    y = "Average Productivity"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

The LO(W)ESS smoothing line revealed a clear pattern of ups and downs in daily productivity. These repeating waves suggest a weekly seasonality, where productivity tends to rise and fall in a regular cycle. This kind of pattern could be related to work schedules, fatigue near the end of the week, or operational routines. LO(W)ESS helped smooth out the noise and made these patterns easier to see.

Can you illustrate the seasonality using ACF or PACF?

library(xts)
productivity_xts <- xts(productivity_ts$avg_productivity, order.by = productivity_ts$date)
acf_result <- acf(productivity_ts$avg_productivity, plot = FALSE, na.action = na.exclude)
acf_data <- data.frame(
  lag = acf_result$lag,
  acf = acf_result$acf
)
head(acf_data)
##   lag       acf
## 1   0 1.0000000
## 2   1 0.5842230
## 3   2 0.4363752
## 4   3 0.4552327
## 5   4 0.5011326
## 6   5 0.4019822

ACF shows a strong positive correlation at lag 1 (0.58), meaning productivity on one day is closely related to the previous day. The values gradually decrease over time, suggesting a short-term pattern in productivity. This indicates that workers tend to follow a consistent routine from day to day, even though daily data may have some noise.

acf(productivity_xts, main = "Autocorrelation (ACF) of Daily Productivity", na.action = na.exclude)

pacf_result <- pacf(productivity_ts$avg_productivity, plot = FALSE, na.action = na.exclude)
pacf_data <- data.frame(
  lag = pacf_result$lag,
  pacf = pacf_result$acf
)
head(pacf_data)
##   lag         pacf
## 1   1  0.584223046
## 2   2  0.144316111
## 3   3  0.236867868
## 4   4  0.229034889
## 5   5 -0.006824074
## 6   6 -0.193924108

PACF also shows a strong spike at lag 1, but the values drop more quickly after that. This means that most of the relationship is explained by the immediate previous day. Later lags do not add much new information once the effect of lag 1 is considered. This supports the idea that recent productivity influences the current day more than longer gaps in time.

pacf(productivity_xts, main = "Partial Autocorrelation (PACF) of Daily Productivity", na.action = na.exclude)

Even though weekly data was more useful for seeing overall trends, daily data was still included to help explore short-term changes. Tools like ACF and PACF work best with daily data and can show if productivity repeats in a regular pattern, like weekly cycles.ACF plot showed some correlation between days, which means productivity levels may be related from one day to the next. The PACF plot helped show which days had the strongest direct effect. Looking at daily data this way helps find patterns that might not be clear in weekly or monthly data. It gives a better idea of how productivity changes from day to day and whether there are repeated patterns in the work process.

Conclusion

This time series analysis aimed to understand worker productivity trends in the garment industry over time. Data were summarized at daily, weekly, monthly, and quarterly levels, with weekly summaries offering the best balance for identifying trends. A linear regression model indicated a slight downward trend in productivity, particularly during February. LO(W)ESS smoothing revealed repeating patterns that suggested weekly cycles in performance. These findings were supported by the autocorrelation function (ACF), which showed strong correlations at short lags. The partial autocorrelation function (PACF) confirmed that the strongest direct influence came from the previous day, indicating a short-term routine in productivity. Together, these tools provided a comprehensive view of both long-term trends and short-term patterns in worker performance.