How was the data collected?

Open these links in a separate tab by copy pasting the URLs in a different browser tab instead of reloading it in the same tab by clicking on the links. At the time of this writing, for some reason rpubs is trying to load the sites in an iframe which is refused by upstream sites due to the security risk it carries.

The analysis is performed on data scrapped from https://www.ke.com.pk/customer-services/load-shed-schedule/ on August 19, 2022. The schedule does not seem to update weekly or even monthly but it will certainly go stale within a few months (hopefully). To update the analysis, all one has to do is get the data from https://www.ke.com.pk/customer-services/load-shed-schedule/.

I created a small Python script to download the load shedding schedule. The code is accessible https://github.com/hashiromer/KE-Loadshedding-Schedule-Scrapper. This code can be used to download updated data which can be piped down to this script to cleanse and analyze it.

The data is available at https://drive.google.com/file/d/1-uqm1R1c5cvhgt2lhKvPXXfVpLDq9CIA/view?usp=sharing

It is possible for code in this file and/or the shared script to not work anymore due to changes in data format. If it is something important, contact me at the given email address, I may be able to update the code to get it working again.


Loading packages for data analysis

library(tidyverse)
library(lubridate)
library(knitr)
library(kableExtra)


Getting data

url<-"https://drive.google.com/uc?export=download&id=1-uqm1R1c5cvhgt2lhKvPXXfVpLDq9CIA"

data<-read_csv(url)
## Rows: 1445 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Grids, Feeder_Name, Group, Category, 1st_Cycle, 2nd_Cycle, 3rd_Cyc...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spec(data)
## cols(
##   Grids = col_character(),
##   Feeder_Name = col_character(),
##   Group = col_character(),
##   Category = col_character(),
##   `1st_Cycle` = col_character(),
##   `2nd_Cycle` = col_character(),
##   `3rd_Cycle` = col_character(),
##   `4th_Cycle` = col_character(),
##   `5th_Cycle` = col_character(),
##   `6th_Cycle` = col_character()
## )

All of the columns are parsed are characters. View the first few rows to get a sense of data.

Raw Data

  data %>% 
    head(title = "Raw Data") %>% 
    kbl() %>%
    kable_paper(bootstrap_options = "striped", full_width = F)
Grids Feeder_Name Group Category 1st_Cycle 2nd_Cycle 3rd_Cycle 4th_Cycle 5th_Cycle 6th_Cycle
CIVIC CENTER KARNAL BASTI 1A HL 0805~1105 1305~1605 1905~2135 2335~0105
CIVIC CENTER NDFC + (Ex-DOLMAN MALL) 2 Normal-LL 1035~1205 0105~0305
GARDEN EAST RAM SUAMI 4A ML 0505~0735 1005~1235 1505~1735 2135~0005
WEST WHARF KPT WEST WHARF 6 Normal-LL 1635~1805 0305~0505
FEDERAL B SAREENA 5 Normal-LL 1505~1635 0305~0505
AIRPORT 2 KDA PUMP HOUSE 4 Normal-LL 1335~1505 0305~0505


Number of areas(Feeders) by category.

data %>% 
  group_by(Category) %>% 
  count() %>% 
  arrange(desc(n)) %>% 
  
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
Category n
Normal-LL 610
CON-LL 341
VHL 206
ML 148
HL 140


Tidying up data

Cleaning the data to parse columns into more specific data types than characters.

today<-"2022:08:19"
tomorrow<- "2022:08:20"

cleaned_dataSet<-data %>% 
  pivot_longer(contains("_Cycle"),names_to = "Cycle", values_to = "Timing") %>%
  mutate(Feeder_Name=str_squish(Feeder_Name)) %>% 
  mutate(Cycle=as_factor(parse_number(Cycle))) %>%
  # Nulls are represented by -, removing them
  filter(Timing!="-") %>%
    mutate(
    across(
      c(Grids,Feeder_Name),
      str_to_title
    )) %>% 
  mutate(
    across(
      c(Grids, Category),as_factor
    )
  ) %>%
  #There are a few instances where are separated by "-" instead of "~"
  separate(Timing, into = c("Start", "End"), sep="~|-") %>%
  #Formatting time from hourminute to hour:minute:seconds
  mutate(
    across(
      c(Start,End), ~str_c(
        str_sub(.x,1,2),":",str_sub(.x,3,4),":00"
        )
    )
  ) %>%
  #Extracting hours, these are temporary columns to be used later down the chain
  mutate(
    across(
      c(Start,End),
      ~as.numeric(str_sub(.x,1,2)),
      .names="{col}_hour_temp"
    )
  ) %>% 
  #The idea is that if load shedding ends at an hour earlier than starting hour, then its considered next day.
    mutate(
    EndDate=if_else(
      End_hour_temp>Start_hour_temp, today,tomorrow
    )
  ) %>% 
  #Creating datetime from dates and time columns
  mutate(End=str_c(EndDate," ",End)) %>% 
  #Removing temporary columns
  select(
    !c(Start_hour_temp,End_hour_temp,EndDate)
  ) %>% 
  #Creating datetime from dates and time columns
  mutate(Start= str_c(
    today," ",Start
  )) %>% 
  #Creating datetime object from string
  mutate(
    across(
      c(Start,End),
      ~ymd_hms(.x,tz="Asia/Karachi")
      )
  ) %>% 
  #Getting total hours of load shedding from start and end time
  mutate(
    span=as.numeric(as.duration(as.interval(Start,End)),"hours")
  ) 


Viewing Clean datset

cleaned_dataSet %>% 
  head(title = "Clean Dataset") %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
Grids Feeder_Name Group Category Cycle Start End span
Civic Center Karnal Basti 1A HL 1 2022-08-19 08:05:00 2022-08-19 11:05:00 3.0
Civic Center Karnal Basti 1A HL 2 2022-08-19 13:05:00 2022-08-19 16:05:00 3.0
Civic Center Karnal Basti 1A HL 3 2022-08-19 19:05:00 2022-08-19 21:35:00 2.5
Civic Center Karnal Basti 1A HL 4 2022-08-19 23:35:00 2022-08-20 01:05:00 1.5
Civic Center Ndfc + (Ex-Dolman Mall) 2 Normal-LL 1 2022-08-19 10:35:00 2022-08-19 12:05:00 1.5
Civic Center Ndfc + (Ex-Dolman Mall) 2 Normal-LL 2 2022-08-19 01:05:00 2022-08-19 03:05:00 2.0


Total Feeders in Karachi operated by KE

total_feeders<-data %>% 
  nrow()


Typical duration of load shedding

cleaned_dataSet %>% 
  ggplot(aes(x=span)) +
  geom_histogram(bins=30)+
  xlab("Hours")+
  ylab("# Instances")

So, a typical load shedding lasts between 1 to 3 hours with most of it being 1.5 hours. A lot of areas also get 3 hours of load shedding.

Deficit between power supply and demand.

cleaned_dataSet %>% 
  #Summing total hours of load shedding across all feeders
  summarise(total=sum(span)*100/(total_feeders*24))
## # A tibble: 1 × 1
##   total
##   <dbl>
## 1  25.3

KE is only able to meet 75% of current total demand based on total hours of load shedding it is doing at the moment.


Total load shedding in a day

cleaned_dataSet %>% 
  group_by(Grids,Feeder_Name) %>% 
  summarise(Total=sum(span), .groups = "drop") %>% 
  # arrange(desc(Total)) %>% 
  group_by(Total) %>% 
  summarise(disticts=n()) %>%
  ungroup() %>% 
  mutate(percentage=100*disticts/sum(disticts)) %>% 
  arrange(desc(disticts)) %>% 
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
Total disticts percentage
3.500000 594 41.1072664
10.000000 494 34.1868512
5.000000 341 23.5986159
3.533333 6 0.4152249
3.566667 6 0.4152249
3.600000 3 0.2076125

  • 34% of areas are facing 10 hours of daily load shedding.

  • 41% of areas are getting 3.5 hours.

  • 23% of areas get 5 hours of load shedding.


This distribution of load shedding is intentionally non uniform. Quoting from Karachi Electric’s website

K-Electric’s Segmented Load shed (SLS) policy divides feeders based on their loss profile which is determined by the T&D losses and recovery ratios in any particular area. High Loss areas in Karachi face up to 7.5 hours of load shedding (when energy demand is at its peak) whereas low loss areas face no load shedding whatsoever.

The information is somewhat outdated because these high loss areas are now facing up to 10 hours of load shedding, a step up from 7.5 hours and low loss areas are getting at least 3.5 hours of load shedding.


Total hours of load shedding by category of areas.

cleaned_dataSet %>% 
  group_by(Grids,Feeder_Name, Category) %>% 
  summarise(Total=sum(span), .groups = "drop") %>% 
  ggplot(aes(x=fct_reorder(Category, Total),y=Total))+
  geom_boxplot()+
  coord_flip()+
  xlab("category of area")+
  scale_y_continuous(limits = c(0, 10),breaks=seq(0, 10, by = 2))

That is intriguing!. Areas encoded by VHL,ML,HL consistently get 10 hours of daily load shedding. These are most probably the high loss areas for KE.

CON-LL gets 5 hours and Normal LL gets around ~3.5 hours.


How many areas face load shedding at night?

# Night is somewhat defined arbitrarily For the sake of analysis,
# it means load shedding occurs between 1:00 AM to 6:00 AM. Change it to
# Suite your definition of night.
start_h<-1
end_h<-6
cleaned_dataSet %>% 
  mutate(start_hour=hour(Start)) %>% 
  mutate(is_night=between(start_hour,start_h,end_h)) %>% 
    mutate(
    is_night= as_factor(if_else(
      is_night==TRUE, "NightTime","DayTime"
    ))
  ) %>% 

  group_by(Grids,Feeder_Name,is_night) %>%
  summarise(Occurences=n(), .groups = "drop") %>%
  pivot_wider(names_from = is_night,values_from = Occurences,values_fill = 0) %>% 
  filter(NightTime >0 ) %>% 
  summarise(n=100*n()/total_feeders, .groups = "drop") %>% 
    head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F)
n
81.79931

81% of Karachi faces load shedding at night at least once.


How is it distributed by category of areas?

cleaned_dataSet %>% 
  mutate(start_hour=hour(Start)) %>% 
  mutate(is_night=between(start_hour,start_h,end_h)) %>% 
  mutate(
    Day_Night= as_factor(if_else(
      is_night==TRUE, "NightTime","DayTime"
    ))
  ) %>% 
  select(-c(is_night)) %>% 
  group_by(Grids,Feeder_Name,Day_Night) %>%
  summarise(Occurences=n(), Category=first(Category), .groups = "drop") %>% 
  ungroup() %>%
  pivot_wider(names_from = Day_Night,values_from = Occurences,values_fill = 0) %>% 
  mutate(get_at_night=as.numeric(NightTime>0)) %>% 
  group_by(Category) %>%
  summarise(count=100*sum(get_at_night)/n()) %>% 
  ungroup() %>% 
  arrange(desc(count)) %>% 
  
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
Category count
Normal-LL 100.00000
CON-LL 100.00000
HL 62.14286
ML 43.91892
VHL 38.34951


This is rather surprising. A lot of areas worst hit by power shortage (aka having 10 hours of daily load shedding don’t have load shedding at night) whereas ALL areas moderately affected by load shedding faces it at night, all of them.

Surely 3.5 and 5 hours of load shedding can be evenly distributed at daytime without interrupting people at night. This should be possible unless there are constraints not visible in the dataSet.


Load shedding twice at night?

cleaned_dataSet %>% 
  mutate(start_hour=hour(Start)) %>% 
  mutate(is_night=between(start_hour,start_h,end_h)) %>% 
  mutate(
    Day_Night= as_factor(if_else(
      is_night==TRUE, "NightTime","DayTime"
    ))
  ) %>% 
  select(-c(is_night)) %>% 
  group_by(Grids,Feeder_Name,Day_Night) %>%
  summarise(Occurences=n(), Category=first(Category), .groups = "drop") %>% 
  pivot_wider(names_from = Day_Night,values_from = Occurences,values_fill = 0) %>% 
  mutate(get_at_night=as.numeric(NightTime>1)) %>% 
  group_by(Category) %>%
  summarise(count=100*sum(get_at_night)/n()) %>% 
  ungroup() %>% 
  arrange(desc(count)) %>% 
  
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")
Category count
CON-LL 69.79472
HL 0.00000
Normal-LL 0.00000
ML 0.00000
VHL 0.00000


This is even more interesting, CON-LL gets a total of 5 hours of load shedding. However around 70% of these areas gets load shedding twice at night!!. Surely these instances of load shedding should be moved to daytime.

That’s it for this analysis. I might take it up again if something interesting comes up.