A review of the data over a 3-year period of program performance — December 2019 to December 2022 — for OregonSaves, Illinois Secure Choice, and CalSavers, highlighting several positive trends.
Note: This report only focuses on Total Assets indicator.
df <- as.data.frame(Assets_by_Program)
datatable(df, options = list(dom = 't'))
In the data preprocessing and cleaning steps, several important procedures were carried out to ensure the quality and completeness of the dataset. These steps are detailed below:
1. Removing Duplicate Rows: Two duplicate rows in the dataset were identified and eliminated using the unique function in R. This process helps to ensure that each data point in the dataset is unique, preventing any duplication that might lead to inaccurate analysis or skewed results.
2. Handling Missing Values in “CalSavers”: In the “CalSavers” column, four rows contained missing values. This occurred because CalSavers did not initiate monthly reporting until June 30, 2020, and instead had a previous practice of reporting on a quarterly basis. To resolve this issue and align the data with the reporting frequency, we applied the Last Observation Carried Forward (LOCF) imputation method. The LOCF technique involves replacing missing values with the most recent observed value in the time series. This method was chosen to ensure that the imputed values accurately reflect the temporal context of the data. By utilizing the most recent available data point, we maintain the continuity of the information, addressing the missing values without disrupting the dataset’s chronological integrity. This approach enhances the dataset’s completeness and allows for more meaningful analysis and interpretation.
3. Filling in Missing “Total” Values: The “Total” column had four missing values as well because Calsavers was missing four values. To calculate the Total value for all the three states, the sum of the “OregonSaves,” “IL Secure Choice,” and “CalSavers” columns for their respective rows was computed using the rowSums function. This step allowed us to ensure that the “Total” column is complete and accurately reflects the total assets under management for each corresponding row.
These data cleaning and preprocessing steps were crucial to enhance the dataset’s reliability and suitability for subsequent analysis. Removing duplicates, imputing missing values, and filling in the “Total” column contribute to the overall data quality and integrity, which is essential for robust and accurate analysis and reporting.
# Cleaned look of the data
df <- as.data.frame(Assets_by_Program)
datatable(df, options = list(dom = 't'))
This plot provides an overview of the growth rates of total assets across different state retirement programs over time. The purpose of this plot is to visualize how the assets under management in these programs have changed month by month.
X-Axis (Date): The horizontal axis represents time, with each point on the axis corresponding to a specific month and year.
Y-Axis (Value): The vertical axis represents the growth rate of total assets. It measures how much the total assets have changed relative to the previous month.
Programs: Each line in the plot represents a different retirement program. The legend on the right side of the plot indicates which program each color corresponds to. In this plot, you can see the growth rates for “OregonSaves,” “IL Secure Choice,” “CalSavers,” and “Total.”
Hover Information: When you hover your cursor over a data point in the plot, you can see additional information, such as the exact growth rate for that month. This information helps you understand the magnitude of the changes.
The plot allows you to visually compare the growth rates of the different state retirement programs over time. You can observe trends, fluctuations, and any significant changes in the assets under management for each program.
library(dplyr)
library(tidyr)
library(plotly)
# Calculate growth rates for each program
your_data <- Assets_by_Program %>%
arrange(Date) %>%
mutate(
OregonSaves_Growth = (OregonSaves / lag(OregonSaves)) - 1,
`IL Secure Choice_Growth` = (`IL Secure Choice` / lag(`IL Secure Choice`)) - 1,
CalSavers_Growth = (CalSavers / lag(CalSavers)) - 1,
Total_Growth = (Total / lag(Total)) - 1
)
# Melt the data to long format
growth_melted <- your_data %>%
select(Date, starts_with("OregonSaves"), starts_with("IL Secure Choice"), starts_with("CalSavers"), starts_with("Total")) %>%
pivot_longer(cols = -Date, names_to = "Program", values_to = "Value")
# Split data into assets and growth
assets_data <- growth_melted %>%
filter(!grepl("_Growth", Program))
growth_data <- growth_melted %>%
filter(grepl("_Growth", Program)) %>%
rename(Value2 = Value, Program2 = Program, Date2 = Date)
# Merge assets and growth data
data <- bind_cols(assets_data, growth_data)
# Create the Plotly plot
plot <- plot_ly(data, x = ~Date, y = ~Value, color = ~Program, text = ~paste("Growth Rate: ", round(Value2 * 100, digits = 2), "%")) %>%
add_lines(line = list(width = 3)) %>%
layout(
title = "State Retirement Programs - Total Assets",
xaxis = list(title = "Date"),
yaxis = list(title = "Value"),
hovermode = "x unified"
)
# Print the plot
plot
working on it…
library(ggplot2)
# Calculate the percentage contribution for each program
percent_data <- Assets_by_Program %>%
mutate(
OregonSaves_pct = (OregonSaves / Total) * 100,
`IL Secure Choice_pct` = (`IL Secure Choice` / Total) * 100,
CalSavers_pct = (CalSavers / Total) * 100
) %>%
mutate(year_month = paste(year(Date), month(Date, label = TRUE, abbr = TRUE))) %>%
filter(year_month %in% c("2019 Dec","2020 Dec", "2021 Dec", "2022 Dec")) %>%
select(Date, OregonSaves_pct, `IL Secure Choice_pct`, CalSavers_pct)
visualized_data <-percent_data %>%
pivot_longer(cols = -Date, names_to = "Program", values_to = "Value")
# Modify the Date column to display only year and month
visualized_data$Date <- format(visualized_data$Date, "%Y %b")
# Create the static stacked bar chart using ggplot2
p <- ggplot(visualized_data, aes(x = Date, y = Value, fill = Program)) +
geom_bar(stat = "identity", position = "stack") +
labs(
title = "Percentage Contribution of Retirement Programs Over Time",
x = "Date",
y = "Percentage"
) +
geom_text(
aes(label = sprintf("%.1f%%", Value), y = Value),
position = position_stack(vjust = 0.5),
size = 3
) +
theme_minimal()
# Print the plot
print(p)
Assets_by_Program$Date <- as.Date(Assets_by_Program$Date)
t<-Assets_by_Program %>%
select(Date, Total)
ts_data <- ts(t[, -1], start = c(2020, 12), frequency = 12)
ggsubseriesplot(ts_data)
working on it…
# Assuming your data frame is named 'data'
# Select the columns representing the state programs
state_programs <- Assets_by_Program[, c("OregonSaves", "IL Secure Choice", "CalSavers")]
# Calculate the correlation matrix
correlation_matrix <- cor(state_programs, use = "pairwise.complete.obs")
# Convert the correlation matrix to a data frame for DT
correlation_df <- as.data.frame(correlation_matrix)
# Create an interactive DT table
datatable(correlation_df, options = list(dom = 't'))
Interpretation: the correlation matrix shows that the asset values of all three state programs, “OregonSaves,” “IL Secure Choice,” and “CalSavers,” are highly positively correlated. When one program’s assets increase, the others tend to increase as well, and when one program’s assets decrease, the others tend to decrease. These strong positive correlations suggest that there is a common underlying factor or influence affecting the asset values of these programs, and they move together in a similar direction.