Import data
# excel file
data <- read_excel("../00_data/myWaterData.xlsx")
data
## # A tibble: 473,315 × 13
## row_id lat_deg lon_deg report_date status_id water_source water_tech
## <dbl> <dbl> <dbl> <dttm> <chr> <chr> <chr>
## 1 3957 8.07 38.6 2017-04-06 00:00:00 y NA NA
## 2 33512 7.37 40.5 2020-08-04 00:00:00 y Protected Sp… NA
## 3 35125 0.773 34.9 2015-03-18 00:00:00 y Protected Sh… NA
## 4 37760 0.781 35.0 2015-03-18 00:00:00 y Borehole NA
## 5 38118 0.779 35.0 2015-03-18 00:00:00 y Protected Sh… NA
## 6 38501 0.308 34.1 2015-03-19 00:00:00 y Borehole NA
## 7 46357 0.419 34.3 2015-05-19 00:00:00 y Unprotected … NA
## 8 46535 0.444 34.3 2015-05-19 00:00:00 y Protected Sh… NA
## 9 46560 0.456 34.3 2015-05-19 00:00:00 y Protected Sh… NA
## 10 46782 0.467 34.3 2015-05-20 00:00:00 y Protected Sh… NA
## # ℹ 473,305 more rows
## # ℹ 6 more variables: facility_type <chr>, country_name <chr>,
## # install_year <chr>, installer <chr>, pay <chr>, status <chr>
State one question
- What does the water availability look like in areas of water
scarcity?
Plot data
ggplot(data, aes(x = facility_type)) +
geom_bar()

Interpret
- It appears that majority of the facility (the water wells) have been
improved with still some areas with no facilities or unimproved.
- The values are very large so it is difficult to read the
values.