In Part I, we’ll explore a dataset of speed cameras locations on federal highways of Brazil that be merged, in Part II, with a second dataset containing reported accidents on the sames federal highways.
R has a huge number of spatial data packages. In this tutorial we’ll use:
rgdal: R’s interface to the popular library gdal.
rgeos: R’s interface to geos.
tidyverse: for data manipulation and plotting.
ggmap: that extends the plotting package ggplot2 for maps.
readxl: package for manipulate excel files.
So, make sure you have all theses packages installed. To test, try load a specific package, ie. library(ggplot2). If you get a error message, you will need to install the package with install.packages(), ie. install.packages("ggplot2").
The two datasets used for this tutorial can be download from DNIT website. DNIT is a federal governement body responsible for highway maintenance. For our first dataset, access “http://www.dnit.gov.br/rodovias/operacoes-rodoviarias/controle-de-velocidade/ConsultaFaixa20180611144555.xlsx” for the file.
“https://189.9.128.64/mapas-multimodais/shapefiles/shapefiles.rar”
For the importing process, we’ll use read_excel from the readxl package. The argument col_types ensure that the right format for each variable will be used.
speed_cameras_df <- read_excel("ConsultaFaixa20180611144555.xlsx", col_types = c("text", "text", "numeric",
"text", "text", "text", "text", "text",
"text", "text", "date", "date", "text",
"text", "date", "date", "date", "text",
"text", "text", "date", "date", "date",
"numeric", "numeric", "numeric", "numeric",
"numeric", "text"))In our first step, we’ll rename the variables for friendly versions, for avoid space between words mostly.
Now, checking the columns, we have State, Highway, Km, City_name and Coordinates as our main variables, but Company, Track_situation, Stoppage_reason, Light_vehicle_speed and Heavy_vehicle_speed can generate some interisintg analysis as well. Coordinates is a character (text) column and latitude and longitude are united by “/” - we’ll fix this later.
colnames(speed_cameras_df) <- c("State", "Highway", "Km", "City_name", "Track", "Equipment", "Equip_type",
"Notice", "Company", "Contract", "Contract_start", "Contract_end",
"Track_situation", "Instalation_note", "Emission_Date", "Operation_Start",
"Stoppage_start", "Stoppage_reason", "Brand", "Verification_number",
"Verification_date", "Verif_validation", "Technical_study_date", "Light_vehicle_speed",
"Heavy_vehicle_speed", "Time_permanence", "Time_stoppage", "Technical_study", "coordinates")
str(speed_cameras_df)## Classes 'tbl_df', 'tbl' and 'data.frame': 5402 obs. of 29 variables:
## $ State : chr "AC" "AC" "AC" "AC" ...
## $ Highway : chr "317" "317" "317" "317" ...
## $ Km : num 89 89 130 130 298 298 9.5 11 118 118 ...
## $ City_name : chr "SENADOR GUIOMARD (01538)" "SENADOR GUIOMARD (01538)" "CAPIXABA (06475)" "CAPIXABA (06475)" ...
## $ Track : chr "P-C-1" "P-D-1" "P-C-1" "P-D-1" ...
## $ Equipment : chr "ACB00303010" "ACB00303020" "ACB00303070" "ACB00303080" ...
## $ Equip_type : chr "Redutor Eletrônico de Velocidade" "Redutor Eletrônico de Velocidade" "Redutor Eletrônico de Velocidade" "Redutor Eletrônico de Velocidade" ...
## $ Notice : chr "Disp. Lic. 2018 / 1" "Disp. Lic. 2018 / 1" "Disp. Lic. 2018 / 1" "Disp. Lic. 2018 / 1" ...
## $ Company : chr "Kopp" "Kopp" "Kopp" "Kopp" ...
## $ Contract : chr "Núm Contrato Lote 1" "Núm Contrato Lote 1" "Núm Contrato Lote 1" "Núm Contrato Lote 1" ...
## $ Contract_start : POSIXct, format: "2018-01-18" "2018-01-18" ...
## $ Contract_end : POSIXct, format: "2018-07-01" "2018-07-01" ...
## $ Track_situation : chr "Paralisada" "Paralisada" "Operando" "Operando" ...
## $ Instalation_note : chr "303 / 2013 - 1" "303 / 2013 - 2" "303 / 2013 - 7" "303 / 2013 - 8" ...
## $ Emission_Date : POSIXct, format: "2013-10-18" "2013-10-18" ...
## $ Operation_Start : POSIXct, format: "2014-03-11 00:00:00" "2014-03-11 00:00:00" ...
## $ Stoppage_start : POSIXct, format: "2016-12-03" "2017-07-14" ...
## $ Stoppage_reason : chr "Vandalismo" "Vandalismo" NA NA ...
## $ Brand : chr "Help / KMLI" "Help / KMLI" "Help / KMLI" "Help / KMLI" ...
## $ Verification_number : chr "D68737324" "D68737309" "G63091276" "G63091081" ...
## $ Verification_date : POSIXct, format: "2016-11-23 00:00:00" "2016-11-23 00:00:00" ...
## $ Verif_validation : POSIXct, format: "2017-11-22" "2017-11-22" ...
## $ Technical_study_date: POSIXct, format: "2013-09-19" "2013-09-19" ...
## $ Light_vehicle_speed : num 50 50 50 50 60 60 60 60 50 50 ...
## $ Heavy_vehicle_speed : num 50 50 50 50 60 60 60 60 50 50 ...
## $ Time_permanence : num NA NA NA NA NA NA NA NA NA NA ...
## $ Time_stoppage : num NA NA NA NA NA NA NA NA NA NA ...
## $ Technical_study : num 4251 4252 4259 4260 4261 ...
## $ coordinates : chr "-10.153854 / -67.713892" "-10.153862 / -67.713609" "-10.444258 / -67.710047" "-10.444085 / -67.709947" ...
So, How many speed cameras are in Brazil?
## # A tibble: 1 x 1
## n
## <int>
## 1 5402
Seems like there’s 5402 (in July, 2018) in total, but its possible that there’re some cameras that are not working. We can use Track_situation and filter only for functional cameras:
## # A tibble: 1 x 1
## n
## <int>
## 1 4862
There’re other interesting column called Stoppage_reason with the option to filter for speed cameras that stopped working because of “vandalism”.
## # A tibble: 1 x 1
## n
## <int>
## 1 143
Looking in speed cameras by State, Minas Gerais has the most number of cameras. And its not a surprise, because Minas is the third most populous state in Brazil and have one of the largest areas. But surprisily, São Paulo, the most populous state from Brazil has only 31 speed cameras. A good explanation for that is the that there’s a lot of state highway owned in SP.
speed_cameras_df %>% group_by(State) %>%
filter(Track_situation == "Operando") %>%
mutate(n_cameras = n()) %>%
select(State, n_cameras) %>%
distinct(State, .keep_all = T) %>%
arrange(desc(n_cameras)) %>%
print(n = 27)## # A tibble: 27 x 2
## # Groups: State [27]
## State n_cameras
## <chr> <int>
## 1 MG 732
## 2 BA 391
## 3 SC 378
## 4 RS 339
## 5 CE 321
## 6 RN 241
## 7 ES 240
## 8 RJ 216
## 9 MT 206
## 10 GO 187
## 11 PE 186
## 12 MS 177
## 13 PI 172
## 14 AL 153
## 15 MA 143
## 16 PR 140
## 17 PB 138
## 18 SE 100
## 19 PA 90
## 20 TO 82
## 21 RO 81
## 22 DF 66
## 23 SP 31
## 24 AM 22
## 25 RR 18
## 26 AC 10
## 27 AP 2
We can now take a look in the plot for speed cameras, but first we’ll need create a separated column for latitude and longitude. For that, we’ll use the function separate() for split the text in coordinates column using the separator " / " and creating Lat and Long columns in the process.
speed_cameras_df <- speed_cameras_df %>%
separate(coordinates, c("Lat","Long"), sep = " / ", convert = TRUE)Now we can plot a simple map using ggplot from the ggplot2 package. The real power of ggplot2 lies in its ability to add layers to a plot. So, we always start with a blank slate (ie, ggplot(speed_cameras_df)) and add layers upon that, like points (geom_point()), histograms(geom_histogram()), polygons(geom_polygon), texts (geom_text()), etc.
Our dataset is a collection of points, so we’ll use geom_point(), but before that is important to declare our y-variable and x-variable in the argument aes().
Now, we can add the shapefile for Brazilian territory to make our map more interesting. And a cool way to make that is using the ggmap package. The way that ggmap works its plot a raster object produced by the get_map() function that queries the Google Maps servers (OpenStreetMap, etc) for a map.
For the Google maps, the user need to create a API key using https://cloud.google.com/maps-platform. After create the account, the API need to be registred on the R session using the function register_google(key = "mQkzTpiaLYjPqXQBotesgif3EfGL2dbrNVOrogg") (that’s a fake key).
So, let’s querie a map for Brazil using get_map() and save as Brazil. Now we can plot using ggmap(Brazil) that produces:
## Source : https://maps.googleapis.com/maps/api/staticmap?center=Brazil&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&key=xxx
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Brazil&key=xxx
And finally we can add the points layer upon the Brazil layer.