Open these links in a separate tab by copy pasting the URLs in a different browser tab instead of reloading it in the same tab by clicking on the links. At the time of this writing, for some reason rpubs is trying to load the sites in an iframe which is refused by upstream sites due to the security risk it carries.
The analysis is performed on data scrapped from https://www.ke.com.pk/customer-services/load-shed-schedule/ on August 19, 2022. The schedule does not seem to update weekly or even monthly but it will certainly go stale within a few months (hopefully). To update the analysis, all one has to do is get the data from https://www.ke.com.pk/customer-services/load-shed-schedule/.
I created a small Python script to download the load shedding schedule. The code is accessible https://github.com/hashiromer/KE-Loadshedding-Schedule-Scrapper. This code can be used to download updated data which can be piped down to this script to cleanse and analyze it.
The data is available at https://drive.google.com/file/d/1-uqm1R1c5cvhgt2lhKvPXXfVpLDq9CIA/view?usp=sharing
It is possible for code in this file and/or the shared script to not work anymore due to changes in data format. If it is something important, contact me at the given email address, I may be able to update the code to get it working again.
library(tidyverse)
library(lubridate)
library(knitr)
library(kableExtra)
url<-"https://drive.google.com/uc?export=download&id=1-uqm1R1c5cvhgt2lhKvPXXfVpLDq9CIA"
data<-read_csv(url)
## Rows: 1445 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Grids, Feeder_Name, Group, Category, 1st_Cycle, 2nd_Cycle, 3rd_Cyc...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spec(data)
## cols(
## Grids = col_character(),
## Feeder_Name = col_character(),
## Group = col_character(),
## Category = col_character(),
## `1st_Cycle` = col_character(),
## `2nd_Cycle` = col_character(),
## `3rd_Cycle` = col_character(),
## `4th_Cycle` = col_character(),
## `5th_Cycle` = col_character(),
## `6th_Cycle` = col_character()
## )
All of the columns are parsed are characters. View the first few rows to get a sense of data.
data %>%
head(title = "Raw Data") %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| Grids | Feeder_Name | Group | Category | 1st_Cycle | 2nd_Cycle | 3rd_Cycle | 4th_Cycle | 5th_Cycle | 6th_Cycle |
|---|---|---|---|---|---|---|---|---|---|
| CIVIC CENTER | KARNAL BASTI | 1A | HL | 0805~1105 | 1305~1605 | 1905~2135 | 2335~0105 |
|
|
| CIVIC CENTER | NDFC + (Ex-DOLMAN MALL) | 2 | Normal-LL | 1035~1205 | 0105~0305 |
|
|
|
|
| GARDEN EAST | RAM SUAMI | 4A | ML | 0505~0735 | 1005~1235 | 1505~1735 | 2135~0005 |
|
|
| WEST WHARF | KPT WEST WHARF | 6 | Normal-LL | 1635~1805 | 0305~0505 |
|
|
|
|
| FEDERAL B | SAREENA | 5 | Normal-LL | 1505~1635 | 0305~0505 |
|
|
|
|
| AIRPORT 2 | KDA PUMP HOUSE | 4 | Normal-LL | 1335~1505 | 0305~0505 |
|
|
|
|
data %>%
group_by(Category) %>%
count() %>%
arrange(desc(n)) %>%
head() %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
| Category | n |
|---|---|
| Normal-LL | 610 |
| CON-LL | 341 |
| VHL | 206 |
| ML | 148 |
| HL | 140 |
Cleaning the data to parse columns into more specific data types than characters.
today<-"2022:08:19"
tomorrow<- "2022:08:20"
cleaned_dataSet<-data %>%
pivot_longer(contains("_Cycle"),names_to = "Cycle", values_to = "Timing") %>%
mutate(Feeder_Name=str_squish(Feeder_Name)) %>%
mutate(Cycle=as_factor(parse_number(Cycle))) %>%
# Nulls are represented by -, removing them
filter(Timing!="-") %>%
mutate(
across(
c(Grids,Feeder_Name),
str_to_title
)) %>%
mutate(
across(
c(Grids, Category),as_factor
)
) %>%
#There are a few instances where are separated by "-" instead of "~"
separate(Timing, into = c("Start", "End"), sep="~|-") %>%
#Formatting time from hourminute to hour:minute:seconds
mutate(
across(
c(Start,End), ~str_c(
str_sub(.x,1,2),":",str_sub(.x,3,4),":00"
)
)
) %>%
#Extracting hours, these are temporary columns to be used later down the chain
mutate(
across(
c(Start,End),
~as.numeric(str_sub(.x,1,2)),
.names="{col}_hour_temp"
)
) %>%
#The idea is that if load shedding ends at an hour earlier than starting hour, then its considered next day.
mutate(
EndDate=if_else(
End_hour_temp>Start_hour_temp, today,tomorrow
)
) %>%
#Creating datetime from dates and time columns
mutate(End=str_c(EndDate," ",End)) %>%
#Removing temporary columns
select(
!c(Start_hour_temp,End_hour_temp,EndDate)
) %>%
#Creating datetime from dates and time columns
mutate(Start= str_c(
today," ",Start
)) %>%
#Creating datetime object from string
mutate(
across(
c(Start,End),
~ymd_hms(.x,tz="Asia/Karachi")
)
) %>%
#Getting total hours of load shedding from start and end time
mutate(
span=as.numeric(as.duration(as.interval(Start,End)),"hours")
)
cleaned_dataSet %>%
head(title = "Clean Dataset") %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| Grids | Feeder_Name | Group | Category | Cycle | Start | End | span |
|---|---|---|---|---|---|---|---|
| Civic Center | Karnal Basti | 1A | HL | 1 | 2022-08-19 08:05:00 | 2022-08-19 11:05:00 | 3.0 |
| Civic Center | Karnal Basti | 1A | HL | 2 | 2022-08-19 13:05:00 | 2022-08-19 16:05:00 | 3.0 |
| Civic Center | Karnal Basti | 1A | HL | 3 | 2022-08-19 19:05:00 | 2022-08-19 21:35:00 | 2.5 |
| Civic Center | Karnal Basti | 1A | HL | 4 | 2022-08-19 23:35:00 | 2022-08-20 01:05:00 | 1.5 |
| Civic Center | Ndfc + (Ex-Dolman Mall) | 2 | Normal-LL | 1 | 2022-08-19 10:35:00 | 2022-08-19 12:05:00 | 1.5 |
| Civic Center | Ndfc + (Ex-Dolman Mall) | 2 | Normal-LL | 2 | 2022-08-19 01:05:00 | 2022-08-19 03:05:00 | 2.0 |
total_feeders<-data %>%
nrow()
cleaned_dataSet %>%
ggplot(aes(x=span)) +
geom_histogram(bins=30)+
xlab("Hours")+
ylab("# Instances")
So, a typical load shedding lasts between 1 to 3 hours with most of
it being 1.5 hours. A lot of areas also get 3 hours of load
shedding.
cleaned_dataSet %>%
#Summing total hours of load shedding across all feeders
summarise(total=sum(span)*100/(total_feeders*24))
## # A tibble: 1 × 1
## total
## <dbl>
## 1 25.3
KE is only able to meet 75% of current total demand based on total hours of load shedding it is doing at the moment.
cleaned_dataSet %>%
group_by(Grids,Feeder_Name) %>%
summarise(Total=sum(span), .groups = "drop") %>%
# arrange(desc(Total)) %>%
group_by(Total) %>%
summarise(disticts=n()) %>%
ungroup() %>%
mutate(percentage=100*disticts/sum(disticts)) %>%
arrange(desc(disticts)) %>%
head() %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
| Total | disticts | percentage |
|---|---|---|
| 3.500000 | 594 | 41.1072664 |
| 10.000000 | 494 | 34.1868512 |
| 5.000000 | 341 | 23.5986159 |
| 3.533333 | 6 | 0.4152249 |
| 3.566667 | 6 | 0.4152249 |
| 3.600000 | 3 | 0.2076125 |
34% of areas are facing 10 hours of daily load shedding.
41% of areas are getting 3.5 hours.
23% of areas get 5 hours of load shedding.
This distribution of load shedding is intentionally non uniform. Quoting from Karachi Electric’s website
K-Electric’s Segmented Load shed (SLS) policy divides feeders based on their loss profile which is determined by the T&D losses and recovery ratios in any particular area. High Loss areas in Karachi face up to 7.5 hours of load shedding (when energy demand is at its peak) whereas low loss areas face no load shedding whatsoever.
The information is somewhat outdated because these high loss areas are now facing up to 10 hours of load shedding, a step up from 7.5 hours and low loss areas are getting at least 3.5 hours of load shedding.
cleaned_dataSet %>%
group_by(Grids,Feeder_Name, Category) %>%
summarise(Total=sum(span), .groups = "drop") %>%
ggplot(aes(x=fct_reorder(Category, Total),y=Total))+
geom_boxplot()+
coord_flip()+
xlab("category of area")+
scale_y_continuous(limits = c(0, 10),breaks=seq(0, 10, by = 2))
That is intriguing!. Areas encoded by VHL,ML,HL consistently get 10 hours of daily load shedding. These are most probably the high loss areas for KE.
CON-LL gets 5 hours and Normal LL gets around ~3.5 hours.
# Night is somewhat defined arbitrarily For the sake of analysis,
# it means load shedding occurs between 1:00 AM to 6:00 AM. Change it to
# Suite your definition of night.
start_h<-1
end_h<-6
cleaned_dataSet %>%
mutate(start_hour=hour(Start)) %>%
mutate(is_night=between(start_hour,start_h,end_h)) %>%
mutate(
is_night= as_factor(if_else(
is_night==TRUE, "NightTime","DayTime"
))
) %>%
group_by(Grids,Feeder_Name,is_night) %>%
summarise(Occurences=n(), .groups = "drop") %>%
pivot_wider(names_from = is_night,values_from = Occurences,values_fill = 0) %>%
filter(NightTime >0 ) %>%
summarise(n=100*n()/total_feeders, .groups = "drop") %>%
head() %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| n |
|---|
| 81.79931 |
81% of Karachi faces load shedding at night at least once.
cleaned_dataSet %>%
mutate(start_hour=hour(Start)) %>%
mutate(is_night=between(start_hour,start_h,end_h)) %>%
mutate(
Day_Night= as_factor(if_else(
is_night==TRUE, "NightTime","DayTime"
))
) %>%
select(-c(is_night)) %>%
group_by(Grids,Feeder_Name,Day_Night) %>%
summarise(Occurences=n(), Category=first(Category), .groups = "drop") %>%
ungroup() %>%
pivot_wider(names_from = Day_Night,values_from = Occurences,values_fill = 0) %>%
mutate(get_at_night=as.numeric(NightTime>0)) %>%
group_by(Category) %>%
summarise(count=100*sum(get_at_night)/n()) %>%
ungroup() %>%
arrange(desc(count)) %>%
head() %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
| Category | count |
|---|---|
| Normal-LL | 100.00000 |
| CON-LL | 100.00000 |
| HL | 62.14286 |
| ML | 43.91892 |
| VHL | 38.34951 |
This is rather surprising. A lot of areas worst hit by power shortage (aka having 10 hours of daily load shedding don’t have load shedding at night) whereas ALL areas moderately affected by load shedding faces it at night, all of them.
Surely 3.5 and 5 hours of load shedding can be evenly distributed at daytime without interrupting people at night. This should be possible unless there are constraints not visible in the dataSet.
cleaned_dataSet %>%
mutate(start_hour=hour(Start)) %>%
mutate(is_night=between(start_hour,start_h,end_h)) %>%
mutate(
Day_Night= as_factor(if_else(
is_night==TRUE, "NightTime","DayTime"
))
) %>%
select(-c(is_night)) %>%
group_by(Grids,Feeder_Name,Day_Night) %>%
summarise(Occurences=n(), Category=first(Category), .groups = "drop") %>%
pivot_wider(names_from = Day_Night,values_from = Occurences,values_fill = 0) %>%
mutate(get_at_night=as.numeric(NightTime>1)) %>%
group_by(Category) %>%
summarise(count=100*sum(get_at_night)/n()) %>%
ungroup() %>%
arrange(desc(count)) %>%
head() %>%
kbl() %>%
kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
| Category | count |
|---|---|
| CON-LL | 69.79472 |
| HL | 0.00000 |
| Normal-LL | 0.00000 |
| ML | 0.00000 |
| VHL | 0.00000 |
This is even more interesting, CON-LL gets a total of 5 hours of load
shedding. However around 70% of these areas gets load shedding twice at
night!!. Surely these instances of load shedding should be moved to
daytime.
That’s it for this analysis. I might take it up again if something
interesting comes up.