The main site in Mexico, in charge of concentrate al datasets of Mexican gevernment since Mexico holds the Open Data Iniciative is: datos.gob.mx, so we first visit the site, shown in next figure:
Search in the site for what you need, in this case, we need to retrieve all sites of Mexico City connected to México Conectado network. This data can be located in the main site, as shown in the next figure. Scroll the site until the part of the site shown in the next figure appear. Select Puntos México Conectado con acceso gratuito a internet.
As seen in the next figure, when clicking in the link, a site of Puntos México Conectado con acceso gratuito a internet will be open, Click on Datos y Recursos.
The URL link is saved by clicking with the right button of the pointer device:
The URL link is saved into a character string variable in the R session.
Download the file with command download.file. Then load it to the R session with the read.csv command.
# Saving the URL in a variable:
WiFi <- "https://drive.google.com/uc?export=download&id=0B5wc002kIlphYWFfZV9CZHJWc1E"
# Download the file with the URL link:
download.file(WiFi, destfile = "wifi.csv")
# Read the file into session:
wifi <- read.csv("wifi.csv")
As we need to know the dataset nature, we need to execute data exploration by using some R commands:
# The name of the class of an object can be determined by:
class(wifi)
## [1] "data.frame"
# To know the structure of the dataframe:
str(wifi)
## 'data.frame': 3728 obs. of 12 variables:
## $ gid : Factor w/ 3728 levels "100138","100139",..: 972 971 970 969 968 967 966 965 964 963 ...
## $ clave_de_centro_de_trabajo : Factor w/ 3090 levels "-","09BBE0010Q",..: 2860 2862 2759 2912 2820 2788 2536 1899 2751 2954 ...
## $ nombre_de_centro_de_trabajo: Factor w/ 3057 levels "15 DE SEPTIEMBRE",..: 434 476 425 427 1288 577 1376 2329 412 461 ...
## $ latitud : num 19.4 19.4 19.5 19.4 19.4 ...
## $ longitud : num -99.1 -99.1 -99.1 -99.1 -99.1 ...
## $ tipo_de_vialidad : Factor w/ 18 levels "ANDADOR","AVENIDA",..: 4 15 4 15 4 4 4 4 4 4 ...
## $ nombre_de_calle : Factor w/ 2310 levels "1","10","100 METROS",..: 521 1084 1084 1296 2056 1979 2065 928 2077 445 ...
## $ numero_exterior : Factor w/ 593 levels "1","10","100",..: 593 593 593 32 593 19 17 593 593 1 ...
## $ numero_interior : Factor w/ 28 levels "10","12","1699",..: 28 28 28 28 28 28 28 28 28 9 ...
## $ nombre_del_proveedor_mc : Factor w/ 5 levels "AXTEL","OPERBES",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ tipo_de_conectividad : Factor w/ 3 levels "GRANDES ANCHOS DE BANDA",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ estatus : Factor w/ 6 levels "CANCELADO POR REASIGNAR",..: 2 2 2 2 2 2 2 2 2 2 ...
# To know the number of rows:
nrow(wifi)
## [1] 3728
# To know the number of columns:
ncol(wifi)
## [1] 12
# Or simply:
dim(wifi)
## [1] 3728 12
# The variable names are accessible from:
names(wifi)
## [1] "gid" "clave_de_centro_de_trabajo"
## [3] "nombre_de_centro_de_trabajo" "latitud"
## [5] "longitud" "tipo_de_vialidad"
## [7] "nombre_de_calle" "numero_exterior"
## [9] "numero_interior" "nombre_del_proveedor_mc"
## [11] "tipo_de_conectividad" "estatus"
# The values of single variables can be extracted from the
class(wifi[,"nombre_de_centro_de_trabajo"])
## [1] "factor"
# To see how many centers are in the dataset:
length(wifi[,"nombre_de_centro_de_trabajo"])
## [1] 3728
Almost allways, we need the to clean an fix the data. In the next chunk, we convert to character some variables as needed.
wifi$nombre_de_centro_de_trabajo <- as.character(wifi$nombre_de_centro_de_trabajo)
wifi$nombre_de_calle <- as.character(wifi$nombre_de_calle)
To continue exploring and resumming the dataset, we execute some R commands for this.
# To geta summary of the dataset:
summary(wifi)
## gid clave_de_centro_de_trabajo nombre_de_centro_de_trabajo
## 100138 : 1 - : 541 Length:3728
## 100139 : 1 09DPR2065L: 3 Class :character
## 100140 : 1 09DBP0002L: 2 Mode :character
## 100141 : 1 09DBT0013D: 2
## 100142 : 1 09DBT0025I: 2
## 100143 : 1 09DBT0028F: 2
## (Other):3722 (Other) :3176
## latitud longitud tipo_de_vialidad nombre_de_calle
## Min. :19.19 Min. :-99.31 CALLE :2546 Length:3728
## 1st Qu.:19.34 1st Qu.:-99.17 AVENIDA : 739 Class :character
## Median :19.39 Median :-99.13 CALZADA : 111 Mode :character
## Mean :19.39 Mean :-99.13 NINGUNO : 83
## 3rd Qu.:19.45 3rd Qu.:-99.09 PROLONGACIÓN: 53
## Max. :19.56 Max. :-98.96 CERRADA : 44
## (Other) : 152
## numero_exterior numero_interior nombre_del_proveedor_mc
## SIN NÚMERO:2199 SIN NÚMERO:3698 AXTEL : 224
## 1 : 42 A : 4 OPERBES : 413
## 10 : 30 10 : 1 TELECOMM : 5
## 2 : 28 12 : 1 TELMEX : 3
## 4 : 19 1699 : 1 TOTAL PLAY:3083
## 24 : 18 2209 : 1
## (Other) :1392 (Other) : 22
## tipo_de_conectividad estatus
## GRANDES ANCHOS DE BANDA: 364 CANCELADO POR REASIGNAR : 5
## SATELITAL : 5 EN PROCESO DE INSTALACIÓN: 254
## TERRESTRE :3359 IMPLEMENTADO : 100
## INSTALADO : 339
## OPERACIÓN :3002
## VISITA FALLIDA : 28
##
# A better summary output:
lapply(wifi, summary)
## $gid
## 100138 100139 100140 100141 100142 100143 100144 100145 100146
## 1 1 1 1 1 1 1 1 1
## 100147 100148 100149 100150 100151 100152 100153 100154 100155
## 1 1 1 1 1 1 1 1 1
## 100158 100159 100160 100161 100163 100164 100165 100166 100167
## 1 1 1 1 1 1 1 1 1
## 100168 100169 100170 100171 100172 100173 100174 100175 100176
## 1 1 1 1 1 1 1 1 1
## 100177 100178 100179 100180 100181 100182 100184 100185 100186
## 1 1 1 1 1 1 1 1 1
## 100187 100188 100189 100190 100191 100192 100193 100194 100195
## 1 1 1 1 1 1 1 1 1
## 100196 100197 100198 100199 100200 100201 100202 100203 100205
## 1 1 1 1 1 1 1 1 1
## 100206 100207 100208 100209 100210 100211 100212 100213 100214
## 1 1 1 1 1 1 1 1 1
## 100215 100216 100218 100219 100220 100221 100222 100223 100224
## 1 1 1 1 1 1 1 1 1
## 100225 100226 100227 100228 100229 100230 100232 100233 100234
## 1 1 1 1 1 1 1 1 1
## 100235 100236 100237 100238 100239 100243 100244 100247 100248
## 1 1 1 1 1 1 1 1 1
## (Other)
## 3629
##
## $clave_de_centro_de_trabajo
## - 09DPR2065L 09DBP0002L 09DBT0013D 09DBT0025I 09DBT0028F
## 541 3 2 2 2 2
## 09DBT0066I 09DBT0075Q 09DBT0127F 09DBT0162L 09DBT0167G 09DBT0187U
## 2 2 2 2 2 2
## 09DCT0018Y 09DCT0019X 09DCT0020M 09DCT0022K 09DCT0031S 09DCT0403S
## 2 2 2 2 2 2
## 09DES0087C 09DES0127N 09DES0150O 09DES0312J 09DJN0233N 09DJN0373N
## 2 2 2 2 2 2
## 09DJN0448N 09DJN0489N 09DJN0512Y 09DJN0547N 09DJN0745N 09DJN0943N
## 2 2 2 2 2 2
## 09DLT0001F 09DML0031Q 09DML0035M 09DML0059W 09DPR0851N 09DPR0902D
## 2 2 2 2 2 2
## 09DPR0910M 09DPR0929K 09DPR1013Z 09DPR1067C 09DPR1085S 09DPR1120H
## 2 2 2 2 2 2
## 09DPR1150B 09DPR1177I 09DPR1184S 09DPR1188O 09DPR1206N 09DPR1227Z
## 2 2 2 2 2 2
## 09DPR1265C 09DPR1280V 09DPR1323C 09DPR1371M 09DPR1377G 09DPR1408J
## 2 2 2 2 2 2
## 09DPR1432J 09DPR1534G 09DPR1540R 09DPR1558Q 09DPR1562C 09DPR1584O
## 2 2 2 2 2 2
## 09DPR1626X 09DPR1641P 09DPR1643N 09DPR1654T 09DPR1667X 09DPR1726W
## 2 2 2 2 2 2
## 09DPR1740P 09DPR1771I 09DPR1795S 09DPR1807G 09DPR1994R 09DPR2145X
## 2 2 2 2 2 2
## 09DPR2244X 09DPR2280B 09DPR2450F 09DPR2495B 09DPR2570S 09DPR2696Z
## 2 2 2 2 2 2
## 09DPR2702T 09DPR2783U 09DPR2883T 09DPR3034Z 09DPR3219E 09DPR3231Z
## 2 2 2 2 2 2
## 09DPR3266P 09DPR5087R 09DPR5093B 09DST0045W 09DST0083Z 09DST0116Z
## 2 2 2 2 2 2
## 09FLB0001O DFIMS000172 DFIMS000324 DFIMS000365 DFIMS000621 DFIMS000633
## 2 2 2 2 2 2
## DFIST000160 DFSSA002993 09BBE0010Q (Other)
## 2 2 1 2991
##
## $nombre_de_centro_de_trabajo
## Length Class Mode
## 3728 character character
##
## $latitud
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.19 19.34 19.39 19.39 19.45 19.56
##
## $longitud
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -99.31 -99.17 -99.13 -99.13 -99.09 -98.96
##
## $tipo_de_vialidad
## ANDADOR AVENIDA BOULEVARD CALLE CALLEJÓN
## 19 739 5 2546 11
## CALZADA CAMINO CARRETERA CERRADA CIRCUITO
## 111 22 13 44 5
## EJE VIAL NINGUNO PERIFÉRICO PRIVADA PROLONGACIÓN
## 27 83 6 9 53
## RETORNO VEREDA VIADUCTO
## 30 1 4
##
## $nombre_de_calle
## Length Class Mode
## 3728 character character
##
## $numero_exterior
## SIN NÚMERO 1 10 2 4 24
## 2199 42 30 28 19 18
## 11 15 19 20 3 50
## 16 15 15 15 15 15
## 14 7 39 16 18 30
## 14 14 13 12 12 12
## 45 5 6 25 38 54
## 12 12 12 11 11 11
## 22 33 40 12 13 23
## 10 10 10 9 9 9
## 43 60 72 75 100 26
## 9 9 9 9 8 8
## 27 34 41 53 8 110
## 8 8 8 8 8 7
## 138 200 230 29 37 69
## 7 7 7 7 7 7
## 9 94 121 126 159 17
## 7 7 6 6 6 6
## 21 36 44 51 59 66
## 6 6 6 6 6 6
## 79 80 85 98 102 105
## 6 6 6 6 5 5
## 107 117 120 123 140 143
## 5 5 5 5 5 5
## 160 180 300 304 31 32
## 5 5 5 5 5 5
## 35 42 55 62 64 71
## 5 5 5 5 5 5
## 78 82 83 86 103 119
## 5 5 5 5 4 4
## 127 144 152 170 188 222
## 4 4 4 4 4 4
## 235 47 52 (Other)
## 4 4 4 695
##
## $numero_interior
## 10 12 1699 2209 42
## 1 1 1 1 1
## 49 5 6-A A B
## 1 1 1 4 1
## BIS E LOCAL 10 LOCAL 11 LOCAL 3
## 1 1 1 1 1
## LOCAL 30 LOCAL A-5 LOTE 4 PISO 4 PISO 6
## 1 1 1 1 1
## PISO 7 PISO E-3 PISOS 1 Y 2 PLAZA 15 PLAZA 30
## 1 1 1 1 1
## PLAZA 5 SEGUNDO PISO SIN NÚMERO
## 1 1 3698
##
## $nombre_del_proveedor_mc
## AXTEL OPERBES TELECOMM TELMEX TOTAL PLAY
## 224 413 5 3 3083
##
## $tipo_de_conectividad
## GRANDES ANCHOS DE BANDA SATELITAL TERRESTRE
## 364 5 3359
##
## $estatus
## CANCELADO POR REASIGNAR EN PROCESO DE INSTALACIÓN
## 5 254
## IMPLEMENTADO INSTALADO
## 100 339
## OPERACIÓN VISITA FALLIDA
## 3002 28
# Analysing by number of providers:
class(wifi[,"nombre_del_proveedor_mc"])
## [1] "factor"
# As the data type is the one required, to see the name of providers:
levels(wifi[,"nombre_del_proveedor_mc"])
## [1] "AXTEL" "OPERBES" "TELECOMM" "TELMEX" "TOTAL PLAY"
length(levels(wifi[,"nombre_del_proveedor_mc"]))
## [1] 5
# As a simple summary statistic, the frequencies of the levels
# of such a factor variable can be found with table command:
table(wifi[,"nombre_del_proveedor_mc"])
##
## AXTEL OPERBES TELECOMM TELMEX TOTAL PLAY
## 224 413 5 3 3083
As part of Exploratory Data Analysis, we need to generate some graphic output, in the next chunk, we generate a histogram with the providers and the services they provide to México Conectado.
# To Generate a graphic:
library(ggplot2)
tablewifi <- as.data.frame(table(wifi[,"nombre_del_proveedor_mc"]))
names(tablewifi) <- c("Providers", "Number")
# to see this analysis in a histogram:
wifiplot <- ggplot(data = tablewifi, aes(x = Providers, y = Number)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label=Number), vjust=-0.3, size=3.5) +
labs(title = "Providers of México Conectado") +
labs(x = "Company", y = "Number of Services") +
theme_minimal()
wifiplot
In the Big Data section of Emprendedores Journal, appears an article in two sections: