Load CSV file

Loading the csv file to garment_prod variable.

garment_prod <-read.csv("/Users/lakshmimounikab/Desktop/Stats with R/R practice/garment_prod.csv")
garment_prod$team <- as.character(garment_prod$team)
View(garment_prod)

Load required libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(ggplot2)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# time series toolkits
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(tsibble)
## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:zoo':
## 
##     index
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

Identifying time variable

In my data set, there is a column named “date” which has values in ‘mm/dd/yyyy’ format. Now we convert the column into date category using as.Date function in “yyyy-mm-dd” format.

prod <- garment_prod
prod$date <- seq(as.Date("2015-01-01"), by = "days", length.out = nrow(prod))
View(prod)

Response-like variable

I think the “actual_productivity” column could be an interesting one to analyze over time. This column represents the actual productivity achieved by the workers, which seems like an important operational metric that management would want to track and analyze.

Some reasons why actual_productivity could be a good column to focus on:

Tsibble object

prod_ts <- as_tsibble(prod, index=date) |>
  fill_gaps()
View(prod_ts)

Xts object

prod_xts <- xts(x = prod_ts$actual_productivity, 
                  order.by = prod_ts$date)
prod_xts <- setNames(prod_xts, "Act_Prod")
View(prod_xts)

Plotting

To view the variation of actual_productivity over time, we plot a line graph.

prod_xts %>%
  ggplot(mapping = aes(x = Index, y = Act_Prod)) +
  geom_line() +
  labs(title = "Actual_Productivity Over Time",
       subtitle = "Overall trends of Actual productivity") +
  theme_hc()

The trend shows that the overall trends of actual productivity have been increasing over time. The overall trend of increasing productivity suggests that the garment industry is becoming more efficient over time. The plot also shows some fluctuations in productivity over time. Despite the fluctuations, the overall trend is positive, suggesting that the garment industry is on a trajectory of long-term growth.

Taking subset of tsibble data

Taking subset of data for only 2016 year, visualizing the trends using line plot using linear model.

prod_ts |>
  filter_index("2016-01" ~ "2016-12") |>
  ggplot(mapping = aes(x = date, y = actual_productivity)) +
  geom_line() +
  geom_smooth(method = 'lm', color = 'blue', se=FALSE) +
  labs(title = "Actual Productivity trend in year 2016") +
  theme_hc()
## `geom_smooth()` using formula = 'y ~ x'

The plot shows that the actual productivity trend in the year 2016 was generally increasing. However, there were some fluctuations in productivity throughout the year. The fitted line shows that the overall trend of increasing productivity was statistically significant.

Plotting the trend seasonally

Now, we plot the trend seasonally by taking for mean value for every half year.

prod_ts |>
  index_by(year = floor_date(date, 'halfyear')) |>
  summarise(avg_AP = mean(actual_productivity, na.rm = TRUE)) |>
  ggplot(mapping = aes(x = year, y = avg_AP)) +
  geom_line() +
  geom_smooth(span = 0.3, color = 'blue', se=FALSE, ) +
  labs(title = "Average Actual Productivity Over Time",
       subtitle = "(by half year)") +
  scale_x_date(breaks = "1 year", labels = \(x) year(x)) +
  theme_hc()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : span too small.  fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : pseudoinverse used at 16431
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : neighborhood radius 186.48
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric = parametric,
## : There are other near singularities as well. 35903


The plot shows the actual productivity over time (by half year). The trend is that the average actual productivity has been increasing over time. The overall trend of increasing productivity suggests that the garment industry is becoming more efficient over time. This is a positive development for both businesses and consumers.

Smoothing

prod_ts |>
  filter_index("2016" ~ "2017") |>
  drop_na() |>
  ggplot(mapping = aes(x = date, y = actual_productivity)) +
  geom_point(size=1, shape='O') +
  geom_smooth(span=0.2, color = 'blue', se=FALSE) +
  labs(title = "Actual Productivity during 2016-2017") +
  theme_hc()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The plot shows the actual productivity over time, with the trend line smoothed by the span of 0.2. The trend is that the overall actual productivity has been increasing over time. This could be interpreted as a sign that the garment industry is becoming more efficient over time.

ACF and PCF

# ACF plot
ggAcf(prod_ts$actual_productivity) +
  labs(title = "Autocorrelation Function (ACF)")

The ACF plot of the garment productivity data set shows a significant positive autocorrelation at lag 1, which means that the current productivity is positively correlated with the productivity of the previous day. This suggests that there is a daily pattern in the productivity data, and that the productivity on one day is likely to be similar to the productivity on the previous day.

The autocorrelation at lag 1 is above 0.50, which is relatively high. This indicates that the daily pattern in the productivity data is quite strong. The autocorrelation at lag 2 is 0.37, which is also significant. This suggests that there is also a weekly pattern in the productivity data, with productivity on one day being correlated with productivity on the same day of the week in the previous week.

The autocorrelation at higher lags is not significant, which means that there is no evidence of any longer-term patterns in the productivity data.

# PACF plot
ggPacf(prod_ts$actual_productivity) +
  labs(title = "Partial Autocorrelation Function (PACF)")

The partial autocorrelation function (PACF) plot shows the correlation between the current value of the time series and previous values, after controlling for the effect of all intermediate values. In other words, the PACF plot shows the unique contribution of each lag to the correlation with the current value.

The PACF plot of the garment productivity dataset shows a significant positive autocorrelation at lag 1, which is consistent with the ACF plot. This means that the current productivity is positively correlated with the productivity of the previous day, even after controlling for the effect of all intermediate days. This suggests that there is a strong daily pattern in the productivity data.

The PACF plot also shows a significant negative autocorrelation at lag 2. This means that the current productivity is negatively correlated with the productivity of two days ago, even after controlling for the effect of all intermediate days. This suggests that there is also a weekly pattern in the productivity data, with productivity on one day being correlated with productivity on the same day of the week in the previous week, but in the opposite direction.

The autocorrelation at higher lags is not significant, which means that there is no evidence of any longer-term patterns in the productivity data.

Overall, the PACF plot confirms the findings of the ACF plot, namely that the garment productivity data set is non-stationary, with both daily and weekly patterns in the data. This will need to be taken into account when forecasting future productivity.