Group Activity (Module 4)

Correlation of Number of Sessions and Total Kwh Usage in Real-Time EV Charging Transactions

Introduction

This describes a high-resolution dataset of electric vehicle (EV) charging transactions. This is from a large workplace charging program that is resolved with sub-metering capabilities at the station level. The real-time data included here is about the charging usage of various EV consumer types such as managers and non-managers, habitual and casual users, and early and late adopters of the large corporate setting. The users and proponents of this research will benefit from this type of situation to understand the impact’s EV station consumption at a micro-level. The critical data and information provided in here shall be required to maximize and optimize revenue models and strategies for resource utilization. This data can be utilized by scholars globally to address issues in several areas of the industry. When used to its full potential, it can bring about innovation in the current power systems, economy, policy making, energy data management, analytics and how to visualize it.

The crux of the industry that the researchers of the paper wanted to address is the poor network interoperability in EV Infrastructure, where real time data or usage, consumption and performance is not easily accessible across platforms and technologies. Therefore the researchers presented a “high-resolution dataset of real-time EV charging transactions” transmitted at the nearest second over a range period of a year across multiple sites in its respective corporate campus. Captured in the study are the data relativity between 105 charging stations, across 25 facilities operated by a single firm in the U.S. Department of Energy Workplace Charging Challenge. The Workplace Charging Challenge aims to make more charging stations available in every workplace in order to add value by increasing employee workplace satisfaction and reducing carbon emission.\(^{[1]}\) The study yielded more than 3395 real time transactions through 85 users on both free and paid sessions.

Methodology

The data used in this study was gathered from a large workplace charging program in the United States. The logged charging sessions contained several useful information in order to help with identification. These included user-related data such as vehicle types, location-related data such as the specific charging station, and usage characteristic data such as the energy output in kWh. All the data was recorded under a set of common conditions. These were also done automatically and digitally to reduce the potential of human error, improving accuracy of the data. In order to test for the correlation between the total number of sessions and the total amount of kWh usage in this study, the researchers changed the statistical test and instead used simple regression. A regression model was made and subsequently a test for correlation was made as well. Most of the computations were made automatically using functions by coding, replicating the original researchers in order to reduce the possibility of human error.

Results

For this situation, we want to test the significance of the regression model and its correlation with a 95% confidence interval(\(\alpha\) = 0.05)

To visualize, here is the scatter diagram for the data used in this situation:

To test for the regression model’s significance:

Null Hypothesis: \(\beta_1\) = 0

Alternative Hypothesis: \(\beta_1\) \(\ne\) 0

Test Statistic:

The data set provided are in large numbers so we use the console to test the hypothesis.

Reject \(H_0\) if:

P-value < 0.05 or \(|f_0|\) > \(f_{\alpha,1,n-2}\)

Computations:

This is one using the code below:

simpleregress <- lm(TOTSES ~ TOTKWH, data = newdata)
summary(simpleregress)

## 
## Call:
## lm(formula = TOTSES ~ TOTKWH, data = newdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.781 -38.886  -4.627  28.705 110.994 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 100.7807     1.9346  52.095  < 2e-16 ***
## TOTKWH       -1.9940     0.2981  -6.689 2.61e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.24 on 3393 degrees of freedom
## Multiple R-squared:  0.01302,    Adjusted R-squared:  0.01273 
## F-statistic: 44.75 on 1 and 3393 DF,  p-value: 2.613e-11

We find the coefficients for the regression model from the lm function and our regression model would be: \[ \hat{y} ~=~100.7807~-~1.9940x \] Also in the lm function, we find out that

\(f_0\) = 44.75 and a p value = 2.613\(e^{-11}\)

And based on the criteria for rejection: \[ f_0~=44.75~>~f_{0.05,1,3993}~=~3.84~~~and ~~~~P(f_0)=2.613e^{-11}~<~0.05 \]

so we reject \(H_0\), and therefre can conclude that the regression is significant.

Then we test for true correlation where:

Null Hypothesis: \(H_0\): \(\rho\) = 0

Alternative Hypothesis: \(H_0\): \(\rho\) \(\ne\) 0

Test statistic: We will again use the console to compute for the t test statistic.

Reject \(H_0\) if: \(|t_0|\) > \(t_{\alpha/2,n-2}\)

Computations:

We use the code below to get the test statistic for correlation:

corellate <- cor.test(TOTKWH, TOTSES, method = "pearson") 
corellate

## 
##  Pearson's product-moment correlation
## 
## data:  TOTKWH and TOTSES
## t = -6.6892, df = 3393, p-value = 2.613e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.14716327 -0.08075797
## sample estimates:
##        cor 
## -0.1140881

Based on result, our t value is -6.6892 and based on our criteria:

\[ |t_0| = 6.6892 > t_{0.025,3993} = 1.960 \]

Hence, we reject \(H_0\) and that the correlation coefficient is \(\rho\) \(\ne\) 0.

Discussion

Since the reference value of \(t\) is \(t_{0.025} = 1.960\), the value of the test statistics is very far into the critical region, meaning that \(H_0: \rho=0\) should be rejected. The p-value gathered also supports the claim that \(H_0\) be rejected. \(P = 2.613e^{-11}\) which was calculated using the lm function of R.

Furthermore, from the lm function we have gathered information that \(\beta_1 \neq 0\) and that is clearly another information that supports the claim to reject \(H_0\). In addition, the linear equation of the regression line is \(\hat{y} ~=~100.7807~-~1.9940x\). We also found out that \(f_0 = 44.75\)— which is greater than \(f_{0.05,1,3993}\)— meaning that the \(H_0\) be once again rejected. Successfully rejecting \(H_0\) means that there is a linear correlation between the two variables.

Conclusion

The dataset provides a good basis for regulating EV charging stations. Since most EV charging stations are owned by private entities, finding information about EV stations are difficult to obtain. This dataset would be beneficial for groups looking to establish their own EV charging stations, as this information is useful for optimizing their stations and to create resource utilization strategies.

Reference:

[1] U.S. Department of Energy, “PLUGGING INTO THE CHALLENGE,” in Workplace Charging Challenge, energy.gov, 2016, p. 3. Available: https://www.energy.gov/sites/prod/files/2017/01/f34/WPCC_2016%20Annual%20Progress%20Report.pdf. [Accessed 7 August 2021].