Julius Schmid

This is an application of the data visualization class. In this application, we will perform an analysis of a given data set, including visualizations, time series analysis, and regression analysis.

Data Preparation

First, we import the necessary libraries.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(psych)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(reshape2)

Let us start by reading the input csv file.

df <- read.csv("XYZ LLC 2021 P&amp;L.CSV")

To see what the data set looks like, we print the first 6 rows, using the head() function.

head(df)
##                         X X01.01.2022 X02.01.2022 X03.01.2022 X04.01.2022
## 1 Ordinary Income/Expense          NA          NA          NA          NA
## 2                  Income          NA          NA          NA          NA
## 3           Rental Income          NA          NA          NA          NA
## 4               Unit 1115    16662.87           0    16662.87    16662.87
## 5               Unit 1126    15500.00       15500    15500.00    15500.00
## 6               Unit 1215    16300.00       16300    16300.00    16300.00
##   X05.01.2022 X06.01.2022 X07.01.2022 X08.01.2022 X09.01.2022 X10.01.2022
## 1          NA          NA          NA          NA          NA          NA
## 2          NA          NA          NA          NA          NA          NA
## 3          NA          NA          NA          NA          NA          NA
## 4    16662.87    16662.87    16662.87    16662.87           0    16662.87
## 5    15500.00    15500.00    15500.00    15500.00       15500    15500.00
## 6    16300.00    16300.00    16300.00    16300.00       16300    16300.00
##   X11.01.2022 X12.01.2022 X01.01.2023 X01.02.2023 X03.01.2023 X04.01.2023
## 1          NA          NA          NA          NA          NA          NA
## 2          NA          NA          NA          NA          NA          NA
## 3          NA          NA          NA          NA          NA          NA
## 4    16662.87           0    16662.87    16662.87    16662.87           0
## 5    15500.00       15500    15500.00    15500.00    15500.00       15500
## 6    16300.00       16300        0.00    16300.00    16300.00       16300

We immediately remark that, for our analysis, it would be better to have the dates as rows and the units as columns. Hence, we transpose the data frame from above.

# Make changes to the original dataset (especially to the headers) 
# => Facilitates the transposing process
df_pre_transpose <- read.csv("XYZ LLC 2021 P&amp;L.CSV", header = FALSE)
colnames(df_pre_transpose) <- c(1:17)
df_pre_transpose[1,1] <- 'Date'

# Transpose the data frame
df_transposed <- data.frame(t(df_pre_transpose[-1]))
colnames(df_transposed) <- df_pre_transpose[, 1]

Again, let us apply the head() function to see what the first 6 rows look like.

head(df_transposed)
##         Date Ordinary Income/Expense Income Rental Income Unit 1115 Unit 1126
## 2 01-01-2022                                               16662.87     15500
## 3 02-01-2022                                                      0     15500
## 4 03-01-2022                                               16662.87     15500
## 5 04-01-2022                                               16662.87     15500
## 6 05-01-2022                                               16662.87     15500
## 7 06-01-2022                                               16662.87     15500
##   Unit 1215 Unit 1223 Unit 1313 Unit 1314 Unit 1315 Total Rental Income Expense
## 2     16300      3000     14761     17232     16225            99680.87        
## 3     16300      3000     14761     17232     16225               83018        
## 4     16300      3000     14761         0     16225            82448.87        
## 5     16300      3000         0     17232     16225            84919.87        
## 6     16300      3000     14761     17232     16225            99680.87        
## 7     16300      3000     14761     17232     16225            99680.87        
##   Bank Charges Bright Star Credit Union Bank Charges - Other Total Bank Charges
## 2                                 63.78               218.97             282.75
## 3                                 63.78               218.97             282.75
## 4                                 63.78               218.97             282.75
## 5                                 63.78               218.97             282.75
## 6                                 63.78               218.97             282.75
## 7                                 63.78               218.97             282.75
##   Commission Unit 1215 Unit 1223 Unit 1315 Total Commission
## 2                 1475                1475             2950
## 3                    0         0         0                0
## 4                    0         0         0                0
## 5                    0      1500         0             1500
## 6                    0         0         0                0
## 7                    0         0         0                0
##   Depreciation Expense Filing Fees HOA Dues Total Electricity Total Utilities
## 2                21802      138.75                     307.01          307.01
## 3                21802      138.75                     307.01          307.01
## 4                21802      138.75                     307.01          307.01
## 5                21802      138.75                     307.01          307.01
## 6                21802      138.75                     307.01          307.01
## 7                21802      138.75                     307.01          307.01
##   Total Expense Net Income
## 2      25787.52   73893.35
## 3      22837.52   60180.48
## 4      22837.52   59611.35
## 5      24337.52   60582.35
## 6      22837.52   76843.35
## 7      22837.52   76843.35

This is exactly what we would expect the data to look like. For further use, we rename this data frame to ‘data’.

data <- df_transposed

After transposing, note that the dataframe contains entire columns with NA values. Our next objective is to cleanse the data, which is done by eliminating the NA values and looking for outliers.

Let us start by eliminating the columns that have NA values. We can do that by using the na.omit() function.

# Replace all empty values with NA
data[data == ''] <- NA

# Remove all columns with NA values
data <- data[ , colSums(is.na(data))==0]

# Print first 6 rows of the new data frame
head(data)
##         Date Unit 1115 Unit 1126 Unit 1215 Unit 1223 Unit 1313 Unit 1314
## 2 01-01-2022  16662.87     15500     16300      3000     14761     17232
## 3 02-01-2022         0     15500     16300      3000     14761     17232
## 4 03-01-2022  16662.87     15500     16300      3000     14761         0
## 5 04-01-2022  16662.87     15500     16300      3000         0     17232
## 6 05-01-2022  16662.87     15500     16300      3000     14761     17232
## 7 06-01-2022  16662.87     15500     16300      3000     14761     17232
##   Unit 1315 Total Rental Income Bright Star Credit Union Bank Charges - Other
## 2     16225            99680.87                    63.78               218.97
## 3     16225               83018                    63.78               218.97
## 4     16225            82448.87                    63.78               218.97
## 5     16225            84919.87                    63.78               218.97
## 6     16225            99680.87                    63.78               218.97
## 7     16225            99680.87                    63.78               218.97
##   Total Bank Charges Unit 1215.1 Unit 1315.1 Total Commission
## 2             282.75        1475        1475             2950
## 3             282.75           0           0                0
## 4             282.75           0           0                0
## 5             282.75           0           0             1500
## 6             282.75           0           0                0
## 7             282.75           0           0                0
##   Depreciation Expense Filing Fees Total Electricity Total Utilities
## 2                21802      138.75            307.01          307.01
## 3                21802      138.75            307.01          307.01
## 4                21802      138.75            307.01          307.01
## 5                21802      138.75            307.01          307.01
## 6                21802      138.75            307.01          307.01
## 7                21802      138.75            307.01          307.01
##   Total Expense Net Income
## 2      25787.52   73893.35
## 3      22837.52   60180.48
## 4      22837.52   59611.35
## 5      24337.52   60582.35
## 6      22837.52   76843.35
## 7      22837.52   76843.35

Next, we remark that most of the columns are interpreted as a character.

sapply(data, class)
##                     Date                Unit 1115                Unit 1126 
##              "character"              "character"              "character" 
##                Unit 1215                Unit 1223                Unit 1313 
##              "character"              "character"              "character" 
##                Unit 1314                Unit 1315      Total Rental Income 
##              "character"              "character"              "character" 
## Bright Star Credit Union     Bank Charges - Other       Total Bank Charges 
##              "character"              "character"              "character" 
##              Unit 1215.1              Unit 1315.1         Total Commission 
##              "character"              "character"              "character" 
##     Depreciation Expense              Filing Fees        Total Electricity 
##              "character"              "character"              "character" 
##          Total Utilities            Total Expense               Net Income 
##              "character"              "character"              "character"

However, to analyze the data, we need to have numeric values in numeric columns. Except for ‘Date’, we transform all character columns into numerical ones.

data[c(2:21)] <- as.data.frame(apply(data[c(2:21)], 2, as.numeric))
sapply(data, class)
##                     Date                Unit 1115                Unit 1126 
##              "character"                "numeric"                "numeric" 
##                Unit 1215                Unit 1223                Unit 1313 
##                "numeric"                "numeric"                "numeric" 
##                Unit 1314                Unit 1315      Total Rental Income 
##                "numeric"                "numeric"                "numeric" 
## Bright Star Credit Union     Bank Charges - Other       Total Bank Charges 
##                "numeric"                "numeric"                "numeric" 
##              Unit 1215.1              Unit 1315.1         Total Commission 
##                "numeric"                "numeric"                "numeric" 
##     Depreciation Expense              Filing Fees        Total Electricity 
##                "numeric"                "numeric"                "numeric" 
##          Total Utilities            Total Expense               Net Income 
##                "numeric"                "numeric"                "numeric"

Now, we can execute calculations within the data.

At this point, we realize that there exist two columns representing Unit 1215, and two columns representing Unit 1315, with differing values between the two rows (no duplicates). We can assume that they belong together because there are two rows named Unit 1215 and two rows named Unit 1315 in in the original (not transposed) data frame. The respective second rows could refer to a later adjustment for the same date. This would make sense since most of the entries for the second Unit 1215 and the second Unit 1315 are 0, respectively.

Since we want to have only one column per distinct column, we aggregate the two Unit 1215 columns together and do the same for Unit 1315. After that, we drop the second columns for both Units to make them distinct.

# Aggregate the two columns for Unit 1215 and Unit 1315
data['Unit 1215'] = data['Unit 1215'] + data['Unit 1215.1']
data['Unit 1315'] = data['Unit 1315'] + data['Unit 1315.1']

# Drop Unit 1215.1 and Unit 1315.1
data <- select(data, -c('Unit 1215.1', 'Unit 1315.1'))

# Print the first rows of the data
head(data)
##         Date Unit 1115 Unit 1126 Unit 1215 Unit 1223 Unit 1313 Unit 1314
## 2 01-01-2022  16662.87     15500     17775      3000     14761     17232
## 3 02-01-2022      0.00     15500     16300      3000     14761     17232
## 4 03-01-2022  16662.87     15500     16300      3000     14761         0
## 5 04-01-2022  16662.87     15500     16300      3000         0     17232
## 6 05-01-2022  16662.87     15500     16300      3000     14761     17232
## 7 06-01-2022  16662.87     15500     16300      3000     14761     17232
##   Unit 1315 Total Rental Income Bright Star Credit Union Bank Charges - Other
## 2     17700            99680.87                    63.78               218.97
## 3     16225            83018.00                    63.78               218.97
## 4     16225            82448.87                    63.78               218.97
## 5     16225            84919.87                    63.78               218.97
## 6     16225            99680.87                    63.78               218.97
## 7     16225            99680.87                    63.78               218.97
##   Total Bank Charges Total Commission Depreciation Expense Filing Fees
## 2             282.75             2950                21802      138.75
## 3             282.75                0                21802      138.75
## 4             282.75                0                21802      138.75
## 5             282.75             1500                21802      138.75
## 6             282.75                0                21802      138.75
## 7             282.75                0                21802      138.75
##   Total Electricity Total Utilities Total Expense Net Income
## 2            307.01          307.01      25787.52   73893.35
## 3            307.01          307.01      22837.52   60180.48
## 4            307.01          307.01      22837.52   59611.35
## 5            307.01          307.01      24337.52   60582.35
## 6            307.01          307.01      22837.52   76843.35
## 7            307.01          307.01      22837.52   76843.35

As a last step in data preprocessing, we need to rename the columns since R might have difficulties to interpret empty spaces within a title. Further, we add an additional binary column “year”, which might be helpful when trying to investigate if there is a difference between the 2022 and the 2023 values.

colnames(data) <- c("Date", "Unit1115", "Unit1126", "Unit1215",
                           "Unit1223", "Unit1313", "Unit1314",
                           "Unit1315", "TotalRentalIncome",          
                           "BrightStarCreditUnion", "BankChargesOther",
                           "TotalBankCharges", "TotalCommission", 
                           "DepreciationExpense", "FillingFees",
                           "TotalElectricity", "TotalUtilities",
                           "TotalExpense", "NetIncome")

year <- c(2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,
          2023,2023,2023,2023)

data$year <- year

Data Visualization

In order to display meaningful graphs, we need to examine the correclations between the variables. To do that, we use the pairs.panels() function out of the psych library. We consider only attributes that have a nonzero standard deviation.

# Select all the columns with a positive standard deviation
column_std <- sapply(data, sd)
## Warning in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm =
## na.rm): NAs introduced by coercion
stdgt0 <- which(column_std > 0)
data_stdgt0 <- data[, stdgt0]

# Print pairwise correlation for all these columns
pairs.panels(data_stdgt0)

The highest correlation can be found between Total Rental Income and Net Income. Let us create a scatter plot of these two columns, using the ggplot() function geom_point(). We add an additional regression line with geom_smooth(). Since many points have the exact same numerical values and are on top of each other, we add some noise with the geom_jitter() function, which helps us to better understand the relative distribution of the different points.

ggplot(data = data, 
       mapping = aes(x = TotalRentalIncome,
                     y = NetIncome)) +
  geom_point(colour = "#f05053") + 
  geom_smooth(method="loess", colour = "#093028") +
  geom_jitter(width = 500, height = 500, colour = "#f05053") +
  labs(x = "Total Rental Income",
       y = "Net Income",
       title = "Net Income in terms of Total Rent Income",
       subtitle = "Data points are different dates",
       caption = "Source: XYZ LLC 2021 P&amp;L.CSV")
## `geom_smooth()` using formula = 'y ~ x'

For a second plot, we compare the different units to each other. We will do this with the aid of different line plots that are displayed in the same graph. We will use the geom_line() function and use only values for 2022, since they have a consistent time difference.

# Select only values out of 2022
index_2022 <- which(year == 2022)
data_2022 <- data[index_2022,]

# Select all the Unit columns together with the Date
data_2022 <- data_2022[,c(1:8)]
data_2022 <- melt(data_2022 ,  id.vars = 'Date', variable.name = 'Unit')

# Create line plot for each Unit
ggplot(data = data_2022,
       mapping = aes(x = Date,
                     y = value,
                     group=Unit)) +
  geom_line(aes(colour = Unit)) +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(x = "Date",
       y = "Amount",
       title = "Amount for each Unit against Date",
       subtitle = "Each Line represents one Unit",
       caption = "Source: XYZ LLC 2021 P&amp;L.CSV")

We remark that the units attain constant values most of the time, with single outliers. Further, this graph shows that the amount of Unit 1223 is way lower than the amounts for all other units. The other units are all on a similar level.

Linear Regression To start, we’ll use all variables with a standard greater than 0 as independent variables, assuming that all constant variables don’t have a significant contribution to the model.

By trying to predict several different variables, we note that there are direct dependencies between the variables. Parameters like total rental income, net income, or total commission can be calculated deterministically through a formula without the need of a prediction. Consider this example below. We “predict” the value for total commission through a sum of different other parameters. The R-squared value is 1, which means we theoretically have a “perfect” model.

# Select only values out of 2022
index_2022 <- which(year == 2022)
data_2022 <- data[index_2022,]

# Perform the linear regression
linear_regression <- lm(data_2022$TotalCommission ~ data_2022$Unit1115 + data_2022$Unit1215 + data_2022$Unit1313 + data_2022$Unit1314 + data_2022$Unit1315 + data_2022$NetIncome + data_2022$TotalExpense + data_2022$TotalRentalIncome)


print(summary(linear_regression))
## 
## Call:
## lm(formula = data_2022$TotalCommission ~ data_2022$Unit1115 + 
##     data_2022$Unit1215 + data_2022$Unit1313 + data_2022$Unit1314 + 
##     data_2022$Unit1315 + data_2022$NetIncome + data_2022$TotalExpense + 
##     data_2022$TotalRentalIncome)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.701e-12  0.000e+00  1.327e-13  6.318e-13  6.968e-13 
## 
## Coefficients: (3 not defined because of singularities)
##                               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)                  2.819e+04  7.294e-11  3.864e+14  < 2e-16 ***
## data_2022$Unit1115           1.000e+00  1.190e-15  8.400e+14  < 2e-16 ***
## data_2022$Unit1215           1.373e-14  2.556e-15  5.373e+00  0.00171 ** 
## data_2022$Unit1313           1.000e+00  1.252e-15  7.990e+14  < 2e-16 ***
## data_2022$Unit1314           1.000e+00  1.192e-15  8.390e+14  < 2e-16 ***
## data_2022$Unit1315                  NA         NA         NA       NA    
## data_2022$NetIncome         -1.000e+00  1.189e-15 -8.409e+14  < 2e-16 ***
## data_2022$TotalExpense              NA         NA         NA       NA    
## data_2022$TotalRentalIncome         NA         NA         NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.261e-12 on 6 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.169e+30 on 5 and 6 DF,  p-value: < 2.2e-16

The relevant predictor variables seem to be Unit 1115, Unit 1215, Unit 1313, Unit 1314, and Net Income. Eliminating all other variables gives us:

# Select only values out of 2022
index_2022 <- which(year == 2022)
data_2022 <- data[index_2022,]

# Perform the linear regression
linear_regression <- lm(data_2022$TotalCommission ~ data_2022$Unit1115 + data_2022$Unit1215 + data_2022$Unit1313 + data_2022$Unit1314 + data_2022$NetIncome)


print(summary(linear_regression))
## 
## Call:
## lm(formula = data_2022$TotalCommission ~ data_2022$Unit1115 + 
##     data_2022$Unit1215 + data_2022$Unit1313 + data_2022$Unit1314 + 
##     data_2022$NetIncome)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.701e-12  0.000e+00  1.327e-13  6.318e-13  6.968e-13 
## 
## Coefficients:
##                       Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)          2.819e+04  7.294e-11  3.864e+14  < 2e-16 ***
## data_2022$Unit1115   1.000e+00  1.190e-15  8.400e+14  < 2e-16 ***
## data_2022$Unit1215   1.373e-14  2.556e-15  5.373e+00  0.00171 ** 
## data_2022$Unit1313   1.000e+00  1.252e-15  7.990e+14  < 2e-16 ***
## data_2022$Unit1314   1.000e+00  1.192e-15  8.390e+14  < 2e-16 ***
## data_2022$NetIncome -1.000e+00  1.189e-15 -8.409e+14  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.261e-12 on 6 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.169e+30 on 5 and 6 DF,  p-value: < 2.2e-16

This seems to be the accurate model for the calculation of total commission. All p-values are below an assumed alpha value of 0.05.

Consider a more interesting use case: We try to predict the value for a unit. In this case, we want to predict Unit 1215. We could determine its value deterministically again, but now we want to predict its values only with the aid of other Units.

# Select only values out of 2022
index_2022 <- which(year == 2022)
data_2022 <- data[index_2022,]

# Perform the linear regression
linear_regression <- lm(data_2022$Unit1215 ~ data_2022$Unit1115 + data_2022$Unit1313 + data_2022$Unit1314 + data_2022$Unit1315)


print(summary(linear_regression))
## Warning in summary.lm(linear_regression): essentially perfect fit: summary may
## be unreliable
## 
## Call:
## lm(formula = data_2022$Unit1215 ~ data_2022$Unit1115 + data_2022$Unit1313 + 
##     data_2022$Unit1314 + data_2022$Unit1315)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.091e-14 -5.822e-15 -2.768e-15  2.768e-15  4.091e-14 
## 
## Coefficients:
##                      Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)         7.500e+01  3.103e-13  2.417e+14   <2e-16 ***
## data_2022$Unit1115 -3.408e-18  1.153e-18 -2.956e+00   0.0212 *  
## data_2022$Unit1313 -3.682e-18  1.491e-18 -2.469e+00   0.0429 *  
## data_2022$Unit1314 -3.563e-18  1.673e-18 -2.130e+00   0.0707 .  
## data_2022$Unit1315  1.000e+00  1.954e-17  5.117e+16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.631e-14 on 7 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.201e+32 on 4 and 7 DF,  p-value: < 2.2e-16

Even in this case, our model seems to be almost perfect. Assuming that our threshold is a p-value of 0.05, we would need to eliminate Unit 1314 next.

# Select only values out of 2022
index_2022 <- which(year == 2022)
data_2022 <- data[index_2022,]

# Perform the linear regression
linear_regression <- lm(data_2022$Unit1215 ~ data_2022$Unit1115 + data_2022$Unit1313 + data_2022$Unit1315)


print(summary(linear_regression))
## Warning in summary.lm(linear_regression): essentially perfect fit: summary may
## be unreliable
## 
## Call:
## lm(formula = data_2022$Unit1215 ~ data_2022$Unit1115 + data_2022$Unit1313 + 
##     data_2022$Unit1315)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.631e-14 -9.560e-16 -3.350e-16  4.190e-16  3.262e-14 
## 
## Coefficients:
##                     Estimate Std. Error   t value Pr(>|t|)    
## (Intercept)        7.500e+01  1.672e-13 4.485e+14   <2e-16 ***
## data_2022$Unit1115 0.000e+00  6.031e-19 0.000e+00        1    
## data_2022$Unit1313 0.000e+00  7.861e-19 0.000e+00        1    
## data_2022$Unit1315 1.000e+00  1.041e-17 9.609e+16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.421e-14 on 8 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 3.292e+33 on 3 and 8 DF,  p-value: < 2.2e-16

In the last step, the p-values for Unit 1115 and Unit 1313 have risen above the threshold and hence will be discarded, too.

# Select only values out of 2022
index_2022 <- which(year == 2022)
data_2022 <- data[index_2022,]

# Perform the linear regression
linear_regression <- lm(data_2022$Unit1215 ~ data_2022$Unit1315)


print(summary(linear_regression))
## Warning in summary.lm(linear_regression): essentially perfect fit: summary may
## be unreliable
## 
## Call:
## lm(formula = data_2022$Unit1215 ~ data_2022$Unit1315)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##      0      0      0      0      0 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              75          0     Inf   <2e-16 ***
## data_2022$Unit1315        1          0     Inf   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0 on 10 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic:   Inf on 1 and 10 DF,  p-value: < 2.2e-16

We reached our final linear regresion model: Unit 1215 can be “approximated” through the sum 75 + 1 * Unit 1315.

Time Series Analysis Lastly, we will do a time series plot of the different units.

plot.ts(data_2022[,c(2:7)], main = "Time Series Analysis of Units in 2022")

Again, we can see that the unit values are mostly constant, with singular outliers.