Power Plants
MISC
Introduction
This data has been gathered at two solar power plants in India over a 34 day period. It has two pairs of files - each pair has one power generation dataset and one sensor readings dataset. The power generation datasets are gathered at the inverter level - each inverter has multiple lines of solar panels attached to it. The sensor data is gathered at a plant level - single array of sensors optimally placed at the plant.
There are a few areas of concern at the solar power plant -
- Can we predict the power generation for next couple of days? - this allows for better grid management
- Can we identify the need for panel cleaning/maintenance?
- Can we identify faulty or sub-optimally performing equipment?
Ideas addressing concerns
- ARIMA / PROPHET modeling
- at the inverter level, and then rollup (aggregate sum) to power plant level
- inverter data set
- anomaly detection, boxplots
- sensor data set
- anomaly detection, boxplots
MISC Ideas
- If we knew the exact location of the powerplants, we could get local weather information
- find correlation of weather (sunny/cloudy days) to daily yield to discover normal/abnormal inverters & sensors
Resources / Business Knowledge
DATA DICTIONARY for Power Generation Inverter data sets
- AC_POWER : Amount of AC power generated by the inverter (source_key) in this 15 minute interval. Units - kW.
- AC_ : Amount of DC power generated by the inverter (source_key) in this 15 minute interval. Units - kW.
- DAILY_YIELD : Daily yield is a cumulative sum of power generated on that day, till that point in time.
- DATE_TIME : Date and time for each observation. Observations recorded at 15 minute intervals.
- PLANT_ID : Plant ID - this will be common for the entire file.
- SOURCE_KEY : Source key in this file stands for the inverter id.
- TOTAL_YIELD : This is the total yield for the inverter till that point in time.
Power Generation Inverter data set: Plant 1
Get and Clean Data
p1.gd = read_csv('Plant_1_Generation_Data.csv') %>%
slice_sample(prop = 1) %>% #!!<NOTE>temp, working with a sample of datset for speed purposes
clean_names() %>% #lowercase
select(sort(tidyselect::peek_vars())) %>% #sort cols alphabetically
select(where(is.factor),where(is.character),where(is.numeric)) #sort cols by data type
#OlsonNames()
#https://stackoverflow.com/questions/41479008/what-is-the-correct-tz-database-time-zone-for-india
p1.gd = p1.gd %>% mutate(
date_time = as.POSIXct(strptime(p1.gd$date_time, "%d-%m-%Y %H:%M"), tz = 'Asia/Kolkata'),
source_key = factor(p1.gd$source_key)
) %>% rename(inverter = source_key)
p1.gd$plant_id = NULLglimpse structure, and sample rows
## Rows: 68,778
## Columns: 6
## $ date_time <dttm> 2020-05-28 05:15:00, 2020-05-24 17:15:00, 2020-05-16 0...
## $ inverter <fct> 3PZuoBAID5Wc2HD, 3PZuoBAID5Wc2HD, 1BY6WEcLGh8j5v7, adLQ...
## $ ac_power <dbl> 0.000000, 239.728571, 0.000000, 710.685714, 339.371429,...
## $ daily_yield <dbl> 0.00000, 8180.28571, 0.00000, 4575.14286, 205.71429, 47...
## $ dc_power <dbl> 0.00000, 2444.57143, 0.00000, 7258.00000, 3454.00000, 5...
## $ total_yield <dbl> 7085451, 7061448, 6265313, 6487309, 7200435, 6226413, 7...
EDA: Factor Vars
counts each factor’s unique levels
## .
## inverter 22
reference: names of unique levels
## inverter
## 1 3PZuoBAID5Wc2HD
## 2 1BY6WEcLGh8j5v7
## 3 adLQvlD726eNBSB
## 4 YxYtjZvoooNbGkE
## 5 ih0vzX44oOqAx2f
## 6 McdE0feGgRqW7Ca
## 7 uHbuxQJl8lW7ozc
## 8 sjndEbLyjtCKgGv
## 9 pkci93gMrogZuBj
## 10 ZoEaEvLYb1n2sOq
## 11 VHMLBKoKgIrUVDU
## 12 zBIq5rxdHJRwDNY
## 13 bvBOhCH3iADSZry
## 14 ZnxXDlPa8U1GXgE
## 15 1IF53ai7Xc0U56Y
## 16 z9Y9gH1T5YWrNuG
## 17 wCURE6d3bPkepu2
## 18 iCRJl6heRkivqQ3
## 19 zVJPv84UY57bAof
## 20 WRmjgnKYAwPKWDb
## 21 rGa61gmuvPhdLxV
## 22 7JYdWkrLSPkdwr4
viz: distribution of level counts
jpal = colorRampPalette(brewer.pal(8,'Dark2'))(22)
p1.gd %>% count(inverter) %>% plot_ly(y = ~fct_reorder(inverter,n), x = ~n, color = ~inverter, colors = jpal) %>% add_bars(hoverinfo = 'text', text = ~n) %>% hide_legend() %>% layout(
title = 'Source Key Counts',
xaxis = list(title = ''),
yaxis = list(title = '')
) ## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
EDA: Numeric Vars
viz bivariate numeric distribution
viz: numeric univariate distributions
names.numeric = p1.gd %>% select(where(is.numeric)) %>% names
p1.gd %>% dlookr::plot_normality(
names.numeric[1],
names.numeric[2],
names.numeric[3]
)viz: numeric univariate distributions
viz: distributions by ‘inverter’ factor
p1.gd %>% mutate(inverter = fct_reorder(.f = inverter, .x = daily_yield, .fun = median, .desc = TRUE)) %>%
plot_ly(y = ~inverter, x = ~daily_yield, color = ~inverter, colors = jpal) %>% add_boxplot()%>%
hide_legend() %>% layout(xaxis = list(title = ''), yaxis = list(title = ''), title = 'Distribution of Daily Yield by Inverter')p1.gd %>% mutate(inverter = fct_reorder(.f = inverter, .x = ac_power, .fun = median, .desc = TRUE)) %>%
plot_ly(y = ~inverter, x = ~ac_power, color = ~inverter, colors = jpal) %>% add_boxplot()%>%
hide_legend() %>% layout(xaxis = list(title = ''), yaxis = list(title = ''), title = 'Distribution of AC Power by Inverter')p1.gd %>% mutate(inverter = fct_reorder(.f = inverter, .x = dc_power, .fun = median, .desc = TRUE)) %>%
plot_ly(y = ~inverter, x = ~dc_power, color = ~inverter, colors = jpal) %>% add_boxplot()%>%
hide_legend() %>% layout(xaxis = list(title = ''), yaxis = list(title = ''), title = 'Distribution of DC Power by Inverter')viz: cleaning/maintenance days
p1.gd %>% arrange(date_time) %>% group_nest(inverter) %>% unnest
p1.gd.plots = p1.gd %>% arrange(date_time) %>% group_nest(inverter) %>% mutate(
plots = map2(
.x = data,
.y = inverter,
~ggplot(data = .x, aes(date_time, daily_yield, color = daily_yield == 0, group = 1)) +
geom_line(size = 1.2) + scale_color_manual(values = c('black','red')) + ggtitle(paste0('Inverter: ', .y))
))
p1.gd.plots$plotsEDA: Time Series Viz
Anomoly Plot
library(anomalize)
# anomalize(data, target, method = c("iqr", "gesd"), alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
# alpha: Controls the width of the "normal" range. Lower values are more conservative while higher values are less prone to incorrectly classifying "normal" observations.
# max_anoms: The maximum percent of anomalies permitted to be identified.
p1.gd.anomalize = p1.gd %>% arrange(date_time) %>%
mutate(inverter = fct_reorder(inverter, -daily_yield)) %>%
group_by(inverter) %>%
time_decompose(daily_yield, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.05, method = 'gesd') %>%
time_recompose()
p1.gd.anomalize %>%
plot_anomalies(
ncol = 1,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 1.5,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'daily_yield', title = 'daily_yield')Power Generation Inverter data set: Plant 2
Get and Clean Data
p2.gd = read_csv('Plant_2_Generation_Data.csv') %>%
slice_sample(prop = 1) %>% #!!<NOTE>temp, working with a sample of datset for speed purposes
clean_names() %>% #lowercase
select(sort(tidyselect::peek_vars())) %>% #sort cols alphabetically
select(where(is.POSIXct), where(is.factor),where(is.character),where(is.numeric)) #sort cols by data type
#OlsonNames()
#https://stackoverflow.com/questions/41479008/what-is-the-correct-tz-database-time-zone-for-india
p2.gd = p2.gd %>% mutate(
source_key = factor(p2.gd$source_key),
) %>% rename(inverter = source_key)
p2.gd$plant_id = NULLglimpse structure, and sample rows
## Rows: 67,698
## Columns: 6
## $ date_time <dttm> 2020-05-19 06:15:00, 2020-06-07 18:15:00, 2020-06-08 1...
## $ inverter <fct> mqwcsP2rE7J0TFp, V94E5Ben1TlhnDV, vOuJvMaM2sgwLmb, 4UPU...
## $ ac_power <dbl> 48.446667, 55.900000, 996.786667, 510.560000, 72.246667...
## $ daily_yield <dbl> 7.8000000, 1864.8000000, 7536.7333333, 470.7333333, 19....
## $ dc_power <dbl> 50.080000, 57.753333, 1020.840000, 520.833333, 74.48000...
## $ total_yield <dbl> 593612056, 1412242458, 2374225, 2644516, 2397742, 59377...
EDA: Factor Vars
counts each factor’s unique levels
## .
## inverter 22
reference: names of unique levels
## inverter
## 1 mqwcsP2rE7J0TFp
## 2 V94E5Ben1TlhnDV
## 3 vOuJvMaM2sgwLmb
## 4 4UPUqMRk7TRMgml
## 5 rrq4fwE8jgrTyWY
## 6 LYwnQax7tkwH5Cb
## 7 IQ2d7wF4YD8zU1Q
## 8 81aHJ1q11NBPMrL
## 9 NgDl19wMapZy17u
## 10 Mx2yZCDsyf6DPfv
## 11 oZ35aAeoifZaQzV
## 12 Quc1TzYxW2pYoWX
## 13 Qf4GUc1pJu5T6c6
## 14 xoJJ8DcxJEcupym
## 15 oZZkBaNadn6DNKz
## 16 LlT2YUhhzqhg5Sw
## 17 xMbIugepa2P7lBB
## 18 Et9kgGMDl729KT4
## 19 9kRcWv60rDACzjR
## 20 WcxssY2VbP4hApt
## 21 PeE6FRyGXUgsRhN
## 22 q49J1IKaHRwDQnt
viz: distribution of level counts
jpal = colorRampPalette(brewer.pal(8,'Dark2'))(22)
p2.gd %>% count(inverter) %>% plot_ly(y = ~fct_reorder(inverter,n), x = ~n, color = ~inverter, colors = jpal) %>% add_bars(hoverinfo = 'text', text = ~n) %>% hide_legend() %>% layout(
title = 'Source Key Counts',
xaxis = list(title = ''),
yaxis = list(title = '')
) EDA: Numeric Vars
viz bivariate numeric distribution
viz: numeric univariate distributions
names.numeric = p2.gd %>% select(where(is.numeric)) %>% names
p2.gd %>% dlookr::plot_normality(
names.numeric[1],
names.numeric[2],
names.numeric[3],
names.numeric[4]
)viz: numeric univariate distributions
viz: distributions by ‘inverter’ factor
p2.gd %>% mutate(inverter = fct_reorder(.f = inverter, .x = daily_yield, .fun = median, .desc = TRUE)) %>%
plot_ly(y = ~inverter, x = ~daily_yield, color = ~inverter, colors = jpal) %>% add_boxplot()%>%
hide_legend() %>% layout(xaxis = list(title = ''), yaxis = list(title = ''), title = 'Distribution of Daily Yield by Inverter')p2.gd %>% mutate(inverter = fct_reorder(.f = inverter, .x = ac_power, .fun = median, .desc = TRUE)) %>%
plot_ly(y = ~inverter, x = ~ac_power, color = ~inverter, colors = jpal) %>% add_boxplot()%>%
hide_legend() %>% layout(xaxis = list(title = ''), yaxis = list(title = ''), title = 'Distribution of AC Power by Inverter')p2.gd %>% mutate(inverter = fct_reorder(.f = inverter, .x = dc_power, .fun = median, .desc = TRUE)) %>%
plot_ly(y = ~inverter, x = ~dc_power, color = ~inverter, colors = jpal) %>% add_boxplot()%>%
hide_legend() %>% layout(xaxis = list(title = ''), yaxis = list(title = ''), title = 'Distribution of DC Power by Inverter')viz: cleaning/maintenance days
p2.gd %>% arrange(date_time) %>% group_nest(inverter) %>% unnest
p2.gd.plots = p2.gd %>% arrange(date_time) %>% group_nest(inverter) %>% mutate(
plots = map2(
.x = data,
.y = inverter,
~ggplot(data = .x, aes(date_time, daily_yield, color = daily_yield == 0, group = 1)) +
geom_line(size = 1.2) + scale_color_manual(values = c('black','red')) + ggtitle(paste0('Inverter: ', .y))
))
p2.gd.plots$plotsEDA: Time Series Viz
Anomoly Plot
library(anomalize)
# anomalize(data, target, method = c("iqr", "gesd"), alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
# alpha: Controls the width of the "normal" range. Lower values are more conservative while higher values are less prone to incorrectly classifying "normal" observations.
# max_anoms: The maximum percent of anomalies permitted to be identified.
p2.gd.anomalize = p2.gd %>% arrange(date_time) %>%
mutate(inverter = fct_reorder(inverter, -daily_yield)) %>%
group_by(inverter) %>%
time_decompose(daily_yield, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.05, method = 'gesd') %>%
time_recompose()
p2.gd.anomalize %>%
plot_anomalies(
ncol = 1,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 1.5,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'daily_yield', title = 'daily_yield')DATA DICTIONARY for Sensor Reading data sets
- IRRADIATION: Amount of irradiation for the 15 minute interval.
- DATE_TIME: Date and time for each observation. Observations recorded at 15 minute intervals.
- PLANT_ID: Plant ID - this will be common for the entire file.
- SOURCE_KEY: Stands for the sensor panel id. This will be common for the entire file because there’s only one sensor panel for the plant.
- MODULE_TEMPERATURE: There’s a module (solar panel) attached to the sensor panel. This is the temperature reading for that module.
- AMBIENT_TEMPERATURE: This is the ambient temperature at the plant.
Sensor Data Set: Plant 1
Get and Clean Data
p1.ws = read_csv('Plant_1_Weather_Sensor_Data.csv') %>%
slice_sample(prop = 1) %>% #!!<NOTE>temp, working with a sample of datset for speed purposes
clean_names() %>% #lowercase
mutate(across(where(is.character),factor)) %>%
select(sort(tidyselect::peek_vars())) %>% #sort cols alphabetically
select(date_time, where(is.factor), where(is.numeric)) %>% #sort cols by data type
arrange(date_time)
p1.ws$plant_id = NULL
p1.ws$source_key = NULLglimpse structure
## Rows: 3,182
## Columns: 4
## $ date_time <dttm> 2020-05-15 00:00:00, 2020-05-15 00:15:00, 2020...
## $ ambient_temperature <dbl> 25.18432, 25.08459, 24.93575, 24.84613, 24.6215...
## $ irradiation <dbl> 0.0000000000, 0.0000000000, 0.0000000000, 0.000...
## $ module_temperature <dbl> 22.85751, 22.76167, 22.59231, 22.36085, 22.1654...
EDA: Numeric Vars
viz: numeric univariate distributions
viz: numeric univariate outlier check
## # A tibble: 3 x 6
## variables outliers_cnt outliers_ratio outliers_mean with_mean without_mean
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 ambient_temp~ 0 0 NaN 25.5 25.5
## 2 irradiation 2 0.0629 1.19 0.228 0.228
## 3 module_tempe~ 0 0 NaN 31.1 31.1
correlations: viz
p1.ws %>% select(where(is.numeric)) %>% GGally::ggcorr(low = '#990000', mid = '#E0E0E0', high = '#009900', label = TRUE)
viz: Time Series w/IQR percentiles
ggplotly(p1.ws %>% ggplot(aes(date_time, ambient_temperature)) +
geom_line(size = 1.2) +
geom_hline(yintercept = quantile(p1.ws$ambient_temperature, 0.25), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1) +
geom_hline(yintercept = quantile(p1.ws$ambient_temperature, 0.50), lty = 'dashed', alpha = 0.7, color = 'blue', size = 1.1) +
geom_hline(yintercept = quantile(p1.ws$ambient_temperature, 0.75), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1)
) %>% layout(title = 'ambient_temperature')ggplotly(p1.ws %>% ggplot(aes(date_time, irradiation)) +
geom_line(size = 1.2) +
geom_hline(yintercept = quantile(p1.ws$irradiation, 0.25), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1) +
geom_hline(yintercept = quantile(p1.ws$irradiation, 0.50), lty = 'dashed', alpha = 0.7, color = 'blue', size = 1.1) +
geom_hline(yintercept = quantile(p1.ws$irradiation, 0.75), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1)
) %>% layout(title = 'irradiation')ggplotly(p1.ws %>% ggplot(aes(date_time, module_temperature)) +
geom_line(size = 1.2) +
geom_hline(yintercept = quantile(p1.ws$module_temperature, 0.25), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1) +
geom_hline(yintercept = quantile(p1.ws$module_temperature, 0.50), lty = 'dashed', alpha = 0.7, color = 'blue', size = 1.1) +
geom_hline(yintercept = quantile(p1.ws$module_temperature, 0.75), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1)
) %>% layout(title = 'module_temperature')p1.ws %>% pivot_longer(ambient_temperature:module_temperature) %>% ggplot(aes(date_time, value, color = name)) + geom_line(size = 1.1) + facet_wrap(~name, ncol = 1, scales = 'free_y') + theme(legend.position = 'none')Anomoly Plot
library(anomalize)
# anomalize(data, target, method = c("iqr", "gesd"), alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
# alpha: Controls the width of the "normal" range. Lower values are more conservative while higher values are less prone to incorrectly classifying "normal" observations.
# max_anoms: The maximum percent of anomalies permitted to be identified.
ggplotly(
p1.ws %>% arrange(date_time) %>%
time_decompose(ambient_temperature, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.15, method = 'gesd') %>%
time_recompose() %>%
plot_anomalies(
ncol = 2,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 2,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'ambient_temperature', title = 'ambient_temperature')
)## Converting from tbl_df to tbl_time.
## Auto-index message: index = date_time
## frequency = 96 minutes
## median_span = 1591 minutes
ggplotly(
p1.ws %>% arrange(date_time) %>%
time_decompose(irradiation, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.05, method = 'gesd') %>%
time_recompose() %>%
plot_anomalies(
ncol = 2,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 2,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'irradiation', title = 'irradiation')
)## Converting from tbl_df to tbl_time.
## Auto-index message: index = date_time
## frequency = 96 minutes
## median_span = 1591 minutes
ggplotly(
p1.ws %>% arrange(date_time) %>%
time_decompose(module_temperature, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.05, method = 'gesd') %>%
time_recompose() %>%
plot_anomalies(
ncol = 2,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 2,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'module_temperature', title = 'module_temperature')
)## Converting from tbl_df to tbl_time.
## Auto-index message: index = date_time
## frequency = 96 minutes
## median_span = 1591 minutes
Sensor Data Set: Plant 2
Get and Clean Data
p2.ws = read_csv('Plant_2_Weather_Sensor_Data.csv') %>%
slice_sample(prop = 1) %>% #!!<NOTE>temp, working with a sample of datset for speed purposes
clean_names() %>% #lowercase
mutate(across(where(is.character),factor)) %>%
select(sort(tidyselect::peek_vars())) %>% #sort cols alphabetically
select(date_time, where(is.factor), where(is.numeric)) %>% #sort cols by data type
arrange(date_time)
p2.ws$plant_id = NULL
p2.ws$source_key = NULLglimpse structure
## Rows: 3,259
## Columns: 4
## $ date_time <dttm> 2020-05-15 00:00:00, 2020-05-15 00:15:00, 2020...
## $ ambient_temperature <dbl> 27.00476, 26.88081, 26.68206, 26.50059, 26.5961...
## $ irradiation <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000...
## $ module_temperature <dbl> 25.06079, 24.42187, 24.42729, 24.42068, 25.0882...
EDA: Numeric Vars
viz: numeric univariate distributions
viz: numeric univariate outlier check
## # A tibble: 3 x 6
## variables outliers_cnt outliers_ratio outliers_mean with_mean without_mean
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 ambient_temp~ 0 0 NaN 28.1 28.1
## 2 irradiation 1 0.0307 1.10 0.233 0.232
## 3 module_tempe~ 2 0.0614 66.3 32.8 32.8
correlations: viz
p2.ws %>% select(where(is.numeric)) %>% GGally::ggcorr(low = '#990000', mid = '#E0E0E0', high = '#009900', label = TRUE)viz: Time Series w/IQR percentiles
ggplotly(p2.ws %>% ggplot(aes(date_time, ambient_temperature)) +
geom_line(size = 1.2) +
geom_hline(yintercept = quantile(p2.ws$ambient_temperature, 0.25), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1) +
geom_hline(yintercept = quantile(p2.ws$ambient_temperature, 0.50), lty = 'dashed', alpha = 0.7, color = 'blue', size = 1.1) +
geom_hline(yintercept = quantile(p2.ws$ambient_temperature, 0.75), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1)
) %>% layout(title = 'ambient_temperature')ggplotly(p2.ws %>% ggplot(aes(date_time, irradiation)) +
geom_line(size = 1.2) +
geom_hline(yintercept = quantile(p2.ws$irradiation, 0.25), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1) +
geom_hline(yintercept = quantile(p2.ws$irradiation, 0.50), lty = 'dashed', alpha = 0.7, color = 'blue', size = 1.1) +
geom_hline(yintercept = quantile(p2.ws$irradiation, 0.75), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1)
) %>% layout(title = 'irradiation')ggplotly(p2.ws %>% ggplot(aes(date_time, module_temperature)) +
geom_line(size = 1.2) +
geom_hline(yintercept = quantile(p2.ws$module_temperature, 0.25), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1) +
geom_hline(yintercept = quantile(p2.ws$module_temperature, 0.50), lty = 'dashed', alpha = 0.7, color = 'blue', size = 1.1) +
geom_hline(yintercept = quantile(p2.ws$module_temperature, 0.75), lty = 'dashed', alpha = 0.7, color = 'red', size = 1.1)
) %>% layout(title = 'module_temperature')p2.ws %>% pivot_longer(ambient_temperature:module_temperature) %>% ggplot(aes(date_time, value, color = name)) + geom_line(size = 1.1) + facet_wrap(~name, ncol = 1, scales = 'free_y') + theme(legend.position = 'none')Anomoly Plot
library(anomalize)
# anomalize(data, target, method = c("iqr", "gesd"), alpha = 0.05, max_anoms = 0.2, verbose = FALSE)
# alpha: Controls the width of the "normal" range. Lower values are more conservative while higher values are less prone to incorrectly classifying "normal" observations.
# max_anoms: The maximum percent of anomalies permitted to be identified.
ggplotly(
p2.ws %>% arrange(date_time) %>%
time_decompose(ambient_temperature, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.15, method = 'gesd') %>%
time_recompose() %>%
plot_anomalies(
ncol = 2,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 2,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'ambient_temperature', title = 'ambient_temperature')
)## Converting from tbl_df to tbl_time.
## Auto-index message: index = date_time
## frequency = 96 minutes
## median_span = 1629.5 minutes
ggplotly(
p2.ws %>% arrange(date_time) %>%
time_decompose(irradiation, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.05, method = 'gesd') %>%
time_recompose() %>%
plot_anomalies(
ncol = 2,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 2,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'irradiation' , title = 'irradiation')
)## Converting from tbl_df to tbl_time.
## Auto-index message: index = date_time
## frequency = 96 minutes
## median_span = 1629.5 minutes
ggplotly(
p2.ws %>% arrange(date_time) %>%
time_decompose(module_temperature, method = 'twitter', merge = TRUE) %>%
anomalize(remainder, alpha = 0.05, method = 'gesd') %>%
time_recompose() %>%
plot_anomalies(
ncol = 2,
alpha_dots = 0.5,
alpha_circles = 0.5,
size_circles = 2,
time_recomposed = TRUE,
alpha_ribbon = 0.05
) + scale_y_continuous(labels = comma) +
labs(x = '', y = 'module_temperature', title = 'module_temperature')
)## Converting from tbl_df to tbl_time.
## Auto-index message: index = date_time
## frequency = 96 minutes
## median_span = 1629.5 minutes