How was the data collected?

Open these links in a separate tab by copy pasting the URLs in a different browser tab instead of reloading it in the same tab by clicking on the links. At the time of this writing, for some reason rpubs is trying to load the sites in an iframe which is refused by upstream sites due to the security risk it carries.

The analysis is performed on data scrapped from https://www.ke.com.pk/customer-services/load-shed-schedule/ on August 19, 2022. The schedule does not seem to update weekly or even monthly but it will certainly go stale within a few months (hopefully). To update the analysis, all one has to do is get the data from https://www.ke.com.pk/customer-services/load-shed-schedule/.

I created a small Python script to download the load shedding schedule. The code is accessible https://github.com/hashiromer/KE-Loadshedding-Schedule-Scrapper. This code can be used to download updated data which can be piped down to this script to cleanse and analyze it.

The data is available at https://drive.google.com/file/d/1-uqm1R1c5cvhgt2lhKvPXXfVpLDq9CIA/view?usp=sharing

It is possible for code in this file and/or the shared script to not work anymore due to changes in data format. If it is something important, contact me at the given email address, I may be able to update the code to get it working again.

Loading packages for data analysis

library(tidyverse)
library(lubridate)
library(knitr)
library(kableExtra)

Getting data

url<-"https://drive.google.com/uc?export=download&id=1-uqm1R1c5cvhgt2lhKvPXXfVpLDq9CIA"

data<-read_csv(url)

## Rows: 1445 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Grids, Feeder_Name, Group, Category, 1st_Cycle, 2nd_Cycle, 3rd_Cyc...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

spec(data)

## cols(
##   Grids = col_character(),
##   Feeder_Name = col_character(),
##   Group = col_character(),
##   Category = col_character(),
##   `1st_Cycle` = col_character(),
##   `2nd_Cycle` = col_character(),
##   `3rd_Cycle` = col_character(),
##   `4th_Cycle` = col_character(),
##   `5th_Cycle` = col_character(),
##   `6th_Cycle` = col_character()
## )

All of the columns are parsed are characters. View the first few rows to get a sense of data.

Raw Data

  data %>% 
    head(title = "Raw Data") %>% 
    kbl() %>%
    kable_paper(bootstrap_options = "striped", full_width = F)

Grids	Feeder_Name	Group	Category	1st_Cycle	2nd_Cycle	3rd_Cycle	4th_Cycle
CIVIC CENTER	KARNAL BASTI	1A	HL	0805~1105	1305~1605	1905~2135	2335~0105
CIVIC CENTER	NDFC + (Ex-DOLMAN MALL)	2	Normal-LL	1035~1205	0105~0305
GARDEN EAST	RAM SUAMI	4A	ML	0505~0735	1005~1235	1505~1735	2135~0005
WEST WHARF	KPT WEST WHARF	6	Normal-LL	1635~1805	0305~0505
FEDERAL B	SAREENA	5	Normal-LL	1505~1635	0305~0505
AIRPORT 2	KDA PUMP HOUSE	4	Normal-LL	1335~1505	0305~0505

Number of areas(Feeders) by category.

data %>% 
  group_by(Category) %>% 
  count() %>% 
  arrange(desc(n)) %>% 
  
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")

Category	n
Normal-LL	610
CON-LL	341
VHL	206
ML	148
HL	140

Tidying up data

Cleaning the data to parse columns into more specific data types than characters.

today<-"2022:08:19"
tomorrow<- "2022:08:20"

cleaned_dataSet<-data %>% 
  pivot_longer(contains("_Cycle"),names_to = "Cycle", values_to = "Timing") %>%
  mutate(Feeder_Name=str_squish(Feeder_Name)) %>% 
  mutate(Cycle=as_factor(parse_number(Cycle))) %>%
  # Nulls are represented by -, removing them
  filter(Timing!="-") %>%
    mutate(
    across(
      c(Grids,Feeder_Name),
      str_to_title
    )) %>% 
  mutate(
    across(
      c(Grids, Category),as_factor
    )
  ) %>%
  #There are a few instances where are separated by "-" instead of "~"
  separate(Timing, into = c("Start", "End"), sep="~|-") %>%
  #Formatting time from hourminute to hour:minute:seconds
  mutate(
    across(
      c(Start,End), ~str_c(
        str_sub(.x,1,2),":",str_sub(.x,3,4),":00"
        )
    )
  ) %>%
  #Extracting hours, these are temporary columns to be used later down the chain
  mutate(
    across(
      c(Start,End),
      ~as.numeric(str_sub(.x,1,2)),
      .names="{col}_hour_temp"
    )
  ) %>% 
  #The idea is that if load shedding ends at an hour earlier than starting hour, then its considered next day.
    mutate(
    EndDate=if_else(
      End_hour_temp>Start_hour_temp, today,tomorrow
    )
  ) %>% 
  #Creating datetime from dates and time columns
  mutate(End=str_c(EndDate," ",End)) %>% 
  #Removing temporary columns
  select(
    !c(Start_hour_temp,End_hour_temp,EndDate)
  ) %>% 
  #Creating datetime from dates and time columns
  mutate(Start= str_c(
    today," ",Start
  )) %>% 
  #Creating datetime object from string
  mutate(
    across(
      c(Start,End),
      ~ymd_hms(.x,tz="Asia/Karachi")
      )
  ) %>% 
  #Getting total hours of load shedding from start and end time
  mutate(
    span=as.numeric(as.duration(as.interval(Start,End)),"hours")
  )

Viewing Clean datset

cleaned_dataSet %>% 
  head(title = "Clean Dataset") %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F)

Grids	Feeder_Name	Group	Category	Cycle	Start	End	span
Civic Center	Karnal Basti	1A	HL	1	2022-08-19 08:05:00	2022-08-19 11:05:00	3.0
Civic Center	Karnal Basti	1A	HL	2	2022-08-19 13:05:00	2022-08-19 16:05:00	3.0
Civic Center	Karnal Basti	1A	HL	3	2022-08-19 19:05:00	2022-08-19 21:35:00	2.5
Civic Center	Karnal Basti	1A	HL	4	2022-08-19 23:35:00	2022-08-20 01:05:00	1.5
Civic Center	Ndfc + (Ex-Dolman Mall)	2	Normal-LL	1	2022-08-19 10:35:00	2022-08-19 12:05:00	1.5
Civic Center	Ndfc + (Ex-Dolman Mall)	2	Normal-LL	2	2022-08-19 01:05:00	2022-08-19 03:05:00	2.0

Total Feeders in Karachi operated by KE

total_feeders<-data %>% 
  nrow()

Typical duration of load shedding

cleaned_dataSet %>% 
  ggplot(aes(x=span)) +
  geom_histogram(bins=30)+
  xlab("Hours")+
  ylab("# Instances")

So, a typical load shedding lasts between 1 to 3 hours with most of it being 1.5 hours. A lot of areas also get 3 hours of load shedding.

Deficit between power supply and demand.

cleaned_dataSet %>% 
  #Summing total hours of load shedding across all feeders
  summarise(total=sum(span)*100/(total_feeders*24))

## # A tibble: 1 × 1
##   total
##   <dbl>
## 1  25.3

KE is only able to meet 75% of current total demand based on total hours of load shedding it is doing at the moment.

Total load shedding in a day

cleaned_dataSet %>% 
  group_by(Grids,Feeder_Name) %>% 
  summarise(Total=sum(span), .groups = "drop") %>% 
  # arrange(desc(Total)) %>% 
  group_by(Total) %>% 
  summarise(disticts=n()) %>%
  ungroup() %>% 
  mutate(percentage=100*disticts/sum(disticts)) %>% 
  arrange(desc(disticts)) %>% 
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")

Total	disticts	percentage
3.500000	594	41.1072664
10.000000	494	34.1868512
5.000000	341	23.5986159
3.533333	6	0.4152249
3.566667	6	0.4152249
3.600000	3	0.2076125

34% of areas are facing 10 hours of daily load shedding.
41% of areas are getting 3.5 hours.
23% of areas get 5 hours of load shedding.

This distribution of load shedding is intentionally non uniform. Quoting from Karachi Electric’s website

K-Electric’s Segmented Load shed (SLS) policy divides feeders based on their loss profile which is determined by the T&D losses and recovery ratios in any particular area. High Loss areas in Karachi face up to 7.5 hours of load shedding (when energy demand is at its peak) whereas low loss areas face no load shedding whatsoever.

The information is somewhat outdated because these high loss areas are now facing up to 10 hours of load shedding, a step up from 7.5 hours and low loss areas are getting at least 3.5 hours of load shedding.

Total hours of load shedding by category of areas.

cleaned_dataSet %>% 
  group_by(Grids,Feeder_Name, Category) %>% 
  summarise(Total=sum(span), .groups = "drop") %>% 
  ggplot(aes(x=fct_reorder(Category, Total),y=Total))+
  geom_boxplot()+
  coord_flip()+
  xlab("category of area")+
  scale_y_continuous(limits = c(0, 10),breaks=seq(0, 10, by = 2))

That is intriguing!. Areas encoded by VHL,ML,HL consistently get 10 hours of daily load shedding. These are most probably the high loss areas for KE.

CON-LL gets 5 hours and Normal LL gets around ~3.5 hours.

How many areas face load shedding at night?

# Night is somewhat defined arbitrarily For the sake of analysis,
# it means load shedding occurs between 1:00 AM to 6:00 AM. Change it to
# Suite your definition of night.
start_h<-1
end_h<-6
cleaned_dataSet %>% 
  mutate(start_hour=hour(Start)) %>% 
  mutate(is_night=between(start_hour,start_h,end_h)) %>% 
    mutate(
    is_night= as_factor(if_else(
      is_night==TRUE, "NightTime","DayTime"
    ))
  ) %>% 

  group_by(Grids,Feeder_Name,is_night) %>%
  summarise(Occurences=n(), .groups = "drop") %>%
  pivot_wider(names_from = is_night,values_from = Occurences,values_fill = 0) %>% 
  filter(NightTime >0 ) %>% 
  summarise(n=100*n()/total_feeders, .groups = "drop") %>% 
    head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F)

n
81.79931

81% of Karachi faces load shedding at night at least once.

How is it distributed by category of areas?

cleaned_dataSet %>% 
  mutate(start_hour=hour(Start)) %>% 
  mutate(is_night=between(start_hour,start_h,end_h)) %>% 
  mutate(
    Day_Night= as_factor(if_else(
      is_night==TRUE, "NightTime","DayTime"
    ))
  ) %>% 
  select(-c(is_night)) %>% 
  group_by(Grids,Feeder_Name,Day_Night) %>%
  summarise(Occurences=n(), Category=first(Category), .groups = "drop") %>% 
  ungroup() %>%
  pivot_wider(names_from = Day_Night,values_from = Occurences,values_fill = 0) %>% 
  mutate(get_at_night=as.numeric(NightTime>0)) %>% 
  group_by(Category) %>%
  summarise(count=100*sum(get_at_night)/n()) %>% 
  ungroup() %>% 
  arrange(desc(count)) %>% 
  
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")

Category	count
Normal-LL	100.00000
CON-LL	100.00000
HL	62.14286
ML	43.91892
VHL	38.34951

This is rather surprising. A lot of areas worst hit by power shortage (aka having 10 hours of daily load shedding don’t have load shedding at night) whereas ALL areas moderately affected by load shedding faces it at night, all of them.

Surely 3.5 and 5 hours of load shedding can be evenly distributed at daytime without interrupting people at night. This should be possible unless there are constraints not visible in the dataSet.

Load shedding twice at night?

cleaned_dataSet %>% 
  mutate(start_hour=hour(Start)) %>% 
  mutate(is_night=between(start_hour,start_h,end_h)) %>% 
  mutate(
    Day_Night= as_factor(if_else(
      is_night==TRUE, "NightTime","DayTime"
    ))
  ) %>% 
  select(-c(is_night)) %>% 
  group_by(Grids,Feeder_Name,Day_Night) %>%
  summarise(Occurences=n(), Category=first(Category), .groups = "drop") %>% 
  pivot_wider(names_from = Day_Night,values_from = Occurences,values_fill = 0) %>% 
  mutate(get_at_night=as.numeric(NightTime>1)) %>% 
  group_by(Category) %>%
  summarise(count=100*sum(get_at_night)/n()) %>% 
  ungroup() %>% 
  arrange(desc(count)) %>% 
  
  head() %>% 
  kbl() %>%
  kable_paper(bootstrap_options = "striped", full_width = F,position = "left")

Category	count
CON-LL	69.79472
HL	0.00000
Normal-LL	0.00000
ML	0.00000
VHL	0.00000

This is even more interesting, CON-LL gets a total of 5 hours of load shedding. However around 70% of these areas gets load shedding twice at night!!. Surely these instances of load shedding should be moved to daytime.

That’s it for this analysis. I might take it up again if something interesting comes up.

Analysis of Load Shedding Schedule in Karachi

Hashir Omer

2022-08-18