05/09/2020

DATA COSTS IN SOUTH AFRICA

According to EWN the costs of internet-connectivity (data) is high in South African when compared to other African countries. On average, 1 Gigabyte of data costs ($7.19 ~ R122.23) while same size of data costs approximately 3 dollars in Kenya.

Since July 2018, I have been using rain, after 27 months it worth exploring whether the start-up’s data package is cost effective. In order to obtain the data, I download my invoice for each of the months. Unfortunately, the invoices are rendered in PDF.

Importing Text

The initial phase of analysis began with reading-in all 27 files at once. To accomplish this, the list.files function was applied. Thereafter, the lapply along with the readtext was utilised. Using the tidyverse work-horse bind_rows text were combined.

Files <- list.files(pattern = "pdf")

Rain_Invoices <- lapply(Files,readtext) %>% 
  bind_rows()

Rain_Invoices_Rows <- Rain_Invoices%>% 
  unnest_paragraphs("paragraphs",text,paragraph_break = "\n")

Wrangling Text

To convert the data to a data.frame, rows referring to data used and amounts charged were extract by using functions from the stringr and stringi packages. In particular, str_detect, stri_extract were helpful resources for referencing specified line through regular expressions.

Out_of_Bundle_1 <- tibble(Rain_Invoices_Rows %>% 
  mutate(status = str_detect(paragraphs,"pay after you use|pay as you use")) %>%
  filter(status == TRUE)) %>% 
  mutate(cellphone_number = stringi::stri_extract(paragraphs,pattern= "\\d{11}",regex = TRUE),
    description = str_extract(paragraphs,pattern= "pay after you use"),
    data_usage_gb = as.double(str_extract(paragraphs,"\\s\\d.\\d{4}")),
    period = gsub(".pdf","",doc_id),
    month = paste(28,period,sep = "_"),
    month = gsub("_","-",month),
    month = lubridate::dmy(month),
    cost = data_usage_gb*50) %>% 
  select(-c(status,paragraphs,doc_id,period))

Shiny App

The data indicates an average of $15.19 spent for approximately 176 Gigabytes per month. Translating to a rather favourable rate of 0.085c per Gigabyte. Interesting, the Shiny App plot indicates an upward trend in expenses since July 2018. To explore the app, visit this link