Introduction


The main site in Mexico, in charge of concentrate al datasets of Mexican gevernment since Mexico holds the Open Data Iniciative is: datos.gob.mx, so we first visit the site, shown in next figure:


Searching datos.gob.mx


Search in the site for what you need, in this case, we need to retrieve all sites of Mexico City connected to México Conectado network. This data can be located in the main site, as shown in the next figure. Scroll the site until the part of the site shown in the next figure appear. Select Puntos México Conectado con acceso gratuito a internet.


As seen in the next figure, when clicking in the link, a site of Puntos México Conectado con acceso gratuito a internet will be open, Click on Datos y Recursos.


The URL link is saved by clicking with the right button of the pointer device:


The URL link is saved into a character string variable in the R session.

Download the file with command download.file. Then load it to the R session with the read.csv command.

# Saving the URL in a variable:
WiFi <- "https://drive.google.com/uc?export=download&id=0B5wc002kIlphYWFfZV9CZHJWc1E"
# Download the file with the URL link:
download.file(WiFi, destfile = "wifi.csv")
# Read the file into session:
wifi <- read.csv("wifi.csv")

Making Data Exploration


As we need to know the dataset nature, we need to execute data exploration by using some R commands:

# The name of the class of an object can be determined by:
class(wifi)
## [1] "data.frame"
# To know the structure of the dataframe:
str(wifi)
## 'data.frame':    3728 obs. of  12 variables:
##  $ gid                        : Factor w/ 3728 levels "100138","100139",..: 972 971 970 969 968 967 966 965 964 963 ...
##  $ clave_de_centro_de_trabajo : Factor w/ 3090 levels "-","09BBE0010Q",..: 2860 2862 2759 2912 2820 2788 2536 1899 2751 2954 ...
##  $ nombre_de_centro_de_trabajo: Factor w/ 3057 levels "15 DE SEPTIEMBRE",..: 434 476 425 427 1288 577 1376 2329 412 461 ...
##  $ latitud                    : num  19.4 19.4 19.5 19.4 19.4 ...
##  $ longitud                   : num  -99.1 -99.1 -99.1 -99.1 -99.1 ...
##  $ tipo_de_vialidad           : Factor w/ 18 levels "ANDADOR","AVENIDA",..: 4 15 4 15 4 4 4 4 4 4 ...
##  $ nombre_de_calle            : Factor w/ 2310 levels "1","10","100 METROS",..: 521 1084 1084 1296 2056 1979 2065 928 2077 445 ...
##  $ numero_exterior            : Factor w/ 593 levels "1","10","100",..: 593 593 593 32 593 19 17 593 593 1 ...
##  $ numero_interior            : Factor w/ 28 levels "10","12","1699",..: 28 28 28 28 28 28 28 28 28 9 ...
##  $ nombre_del_proveedor_mc    : Factor w/ 5 levels "AXTEL","OPERBES",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ tipo_de_conectividad       : Factor w/ 3 levels "GRANDES ANCHOS DE BANDA",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ estatus                    : Factor w/ 6 levels "CANCELADO POR REASIGNAR",..: 2 2 2 2 2 2 2 2 2 2 ...
# To know the number of rows: 
nrow(wifi)
## [1] 3728
# To know the number of columns:
ncol(wifi)
## [1] 12
# Or simply:
dim(wifi)
## [1] 3728   12
# The variable names are accessible from:
names(wifi)
##  [1] "gid"                         "clave_de_centro_de_trabajo" 
##  [3] "nombre_de_centro_de_trabajo" "latitud"                    
##  [5] "longitud"                    "tipo_de_vialidad"           
##  [7] "nombre_de_calle"             "numero_exterior"            
##  [9] "numero_interior"             "nombre_del_proveedor_mc"    
## [11] "tipo_de_conectividad"        "estatus"
# The values of single variables can be extracted from the
class(wifi[,"nombre_de_centro_de_trabajo"])
## [1] "factor"
# To see how many centers are in the dataset:
length(wifi[,"nombre_de_centro_de_trabajo"])
## [1] 3728

Tyde Data


Almost allways, we need the to clean an fix the data. In the next chunk, we convert to character some variables as needed.

wifi$nombre_de_centro_de_trabajo <- as.character(wifi$nombre_de_centro_de_trabajo)
wifi$nombre_de_calle <- as.character(wifi$nombre_de_calle)

Exploratory Data Analysis


To continue exploring and resumming the dataset, we execute some R commands for this.

# To geta summary of the dataset:
summary(wifi)
##       gid       clave_de_centro_de_trabajo nombre_de_centro_de_trabajo
##  100138 :   1   -         : 541            Length:3728                
##  100139 :   1   09DPR2065L:   3            Class :character           
##  100140 :   1   09DBP0002L:   2            Mode  :character           
##  100141 :   1   09DBT0013D:   2                                       
##  100142 :   1   09DBT0025I:   2                                       
##  100143 :   1   09DBT0028F:   2                                       
##  (Other):3722   (Other)   :3176                                       
##     latitud         longitud          tipo_de_vialidad nombre_de_calle   
##  Min.   :19.19   Min.   :-99.31   CALLE       :2546    Length:3728       
##  1st Qu.:19.34   1st Qu.:-99.17   AVENIDA     : 739    Class :character  
##  Median :19.39   Median :-99.13   CALZADA     : 111    Mode  :character  
##  Mean   :19.39   Mean   :-99.13   NINGUNO     :  83                      
##  3rd Qu.:19.45   3rd Qu.:-99.09   PROLONGACIÓN:  53                      
##  Max.   :19.56   Max.   :-98.96   CERRADA     :  44                      
##                                   (Other)     : 152                      
##    numero_exterior   numero_interior nombre_del_proveedor_mc
##  SIN NÚMERO:2199   SIN NÚMERO:3698   AXTEL     : 224        
##  1         :  42   A         :   4   OPERBES   : 413        
##  10        :  30   10        :   1   TELECOMM  :   5        
##  2         :  28   12        :   1   TELMEX    :   3        
##  4         :  19   1699      :   1   TOTAL PLAY:3083        
##  24        :  18   2209      :   1                          
##  (Other)   :1392   (Other)   :  22                          
##               tipo_de_conectividad                      estatus    
##  GRANDES ANCHOS DE BANDA: 364      CANCELADO POR REASIGNAR  :   5  
##  SATELITAL              :   5      EN PROCESO DE INSTALACIÓN: 254  
##  TERRESTRE              :3359      IMPLEMENTADO             : 100  
##                                    INSTALADO                : 339  
##                                    OPERACIÓN                :3002  
##                                    VISITA FALLIDA           :  28  
## 
# A better summary output:
lapply(wifi, summary)
## $gid
##  100138  100139  100140  100141  100142  100143  100144  100145  100146 
##       1       1       1       1       1       1       1       1       1 
##  100147  100148  100149  100150  100151  100152  100153  100154  100155 
##       1       1       1       1       1       1       1       1       1 
##  100158  100159  100160  100161  100163  100164  100165  100166  100167 
##       1       1       1       1       1       1       1       1       1 
##  100168  100169  100170  100171  100172  100173  100174  100175  100176 
##       1       1       1       1       1       1       1       1       1 
##  100177  100178  100179  100180  100181  100182  100184  100185  100186 
##       1       1       1       1       1       1       1       1       1 
##  100187  100188  100189  100190  100191  100192  100193  100194  100195 
##       1       1       1       1       1       1       1       1       1 
##  100196  100197  100198  100199  100200  100201  100202  100203  100205 
##       1       1       1       1       1       1       1       1       1 
##  100206  100207  100208  100209  100210  100211  100212  100213  100214 
##       1       1       1       1       1       1       1       1       1 
##  100215  100216  100218  100219  100220  100221  100222  100223  100224 
##       1       1       1       1       1       1       1       1       1 
##  100225  100226  100227  100228  100229  100230  100232  100233  100234 
##       1       1       1       1       1       1       1       1       1 
##  100235  100236  100237  100238  100239  100243  100244  100247  100248 
##       1       1       1       1       1       1       1       1       1 
## (Other) 
##    3629 
## 
## $clave_de_centro_de_trabajo
##           -  09DPR2065L  09DBP0002L  09DBT0013D  09DBT0025I  09DBT0028F 
##         541           3           2           2           2           2 
##  09DBT0066I  09DBT0075Q  09DBT0127F  09DBT0162L  09DBT0167G  09DBT0187U 
##           2           2           2           2           2           2 
##  09DCT0018Y  09DCT0019X  09DCT0020M  09DCT0022K  09DCT0031S  09DCT0403S 
##           2           2           2           2           2           2 
##  09DES0087C  09DES0127N  09DES0150O  09DES0312J  09DJN0233N  09DJN0373N 
##           2           2           2           2           2           2 
##  09DJN0448N  09DJN0489N  09DJN0512Y  09DJN0547N  09DJN0745N  09DJN0943N 
##           2           2           2           2           2           2 
##  09DLT0001F  09DML0031Q  09DML0035M  09DML0059W  09DPR0851N  09DPR0902D 
##           2           2           2           2           2           2 
##  09DPR0910M  09DPR0929K  09DPR1013Z  09DPR1067C  09DPR1085S  09DPR1120H 
##           2           2           2           2           2           2 
##  09DPR1150B  09DPR1177I  09DPR1184S  09DPR1188O  09DPR1206N  09DPR1227Z 
##           2           2           2           2           2           2 
##  09DPR1265C  09DPR1280V  09DPR1323C  09DPR1371M  09DPR1377G  09DPR1408J 
##           2           2           2           2           2           2 
##  09DPR1432J  09DPR1534G  09DPR1540R  09DPR1558Q  09DPR1562C  09DPR1584O 
##           2           2           2           2           2           2 
##  09DPR1626X  09DPR1641P  09DPR1643N  09DPR1654T  09DPR1667X  09DPR1726W 
##           2           2           2           2           2           2 
##  09DPR1740P  09DPR1771I  09DPR1795S  09DPR1807G  09DPR1994R  09DPR2145X 
##           2           2           2           2           2           2 
##  09DPR2244X  09DPR2280B  09DPR2450F  09DPR2495B  09DPR2570S  09DPR2696Z 
##           2           2           2           2           2           2 
##  09DPR2702T  09DPR2783U  09DPR2883T  09DPR3034Z  09DPR3219E  09DPR3231Z 
##           2           2           2           2           2           2 
##  09DPR3266P  09DPR5087R  09DPR5093B  09DST0045W  09DST0083Z  09DST0116Z 
##           2           2           2           2           2           2 
##  09FLB0001O DFIMS000172 DFIMS000324 DFIMS000365 DFIMS000621 DFIMS000633 
##           2           2           2           2           2           2 
## DFIST000160 DFSSA002993  09BBE0010Q     (Other) 
##           2           2           1        2991 
## 
## $nombre_de_centro_de_trabajo
##    Length     Class      Mode 
##      3728 character character 
## 
## $latitud
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.19   19.34   19.39   19.39   19.45   19.56 
## 
## $longitud
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -99.31  -99.17  -99.13  -99.13  -99.09  -98.96 
## 
## $tipo_de_vialidad
##      ANDADOR      AVENIDA    BOULEVARD        CALLE     CALLEJÓN 
##           19          739            5         2546           11 
##      CALZADA       CAMINO    CARRETERA      CERRADA     CIRCUITO 
##          111           22           13           44            5 
##     EJE VIAL      NINGUNO   PERIFÉRICO      PRIVADA PROLONGACIÓN 
##           27           83            6            9           53 
##      RETORNO       VEREDA     VIADUCTO 
##           30            1            4 
## 
## $nombre_de_calle
##    Length     Class      Mode 
##      3728 character character 
## 
## $numero_exterior
## SIN NÚMERO          1         10          2          4         24 
##       2199         42         30         28         19         18 
##         11         15         19         20          3         50 
##         16         15         15         15         15         15 
##         14          7         39         16         18         30 
##         14         14         13         12         12         12 
##         45          5          6         25         38         54 
##         12         12         12         11         11         11 
##         22         33         40         12         13         23 
##         10         10         10          9          9          9 
##         43         60         72         75        100         26 
##          9          9          9          9          8          8 
##         27         34         41         53          8        110 
##          8          8          8          8          8          7 
##        138        200        230         29         37         69 
##          7          7          7          7          7          7 
##          9         94        121        126        159         17 
##          7          7          6          6          6          6 
##         21         36         44         51         59         66 
##          6          6          6          6          6          6 
##         79         80         85         98        102        105 
##          6          6          6          6          5          5 
##        107        117        120        123        140        143 
##          5          5          5          5          5          5 
##        160        180        300        304         31         32 
##          5          5          5          5          5          5 
##         35         42         55         62         64         71 
##          5          5          5          5          5          5 
##         78         82         83         86        103        119 
##          5          5          5          5          4          4 
##        127        144        152        170        188        222 
##          4          4          4          4          4          4 
##        235         47         52    (Other) 
##          4          4          4        695 
## 
## $numero_interior
##           10           12         1699         2209           42 
##            1            1            1            1            1 
##           49            5          6-A            A            B 
##            1            1            1            4            1 
##          BIS            E     LOCAL 10     LOCAL 11      LOCAL 3 
##            1            1            1            1            1 
##     LOCAL 30    LOCAL A-5       LOTE 4       PISO 4       PISO 6 
##            1            1            1            1            1 
##       PISO 7     PISO E-3  PISOS 1 Y 2     PLAZA 15     PLAZA 30 
##            1            1            1            1            1 
##      PLAZA 5 SEGUNDO PISO   SIN NÚMERO 
##            1            1         3698 
## 
## $nombre_del_proveedor_mc
##      AXTEL    OPERBES   TELECOMM     TELMEX TOTAL PLAY 
##        224        413          5          3       3083 
## 
## $tipo_de_conectividad
## GRANDES ANCHOS DE BANDA               SATELITAL               TERRESTRE 
##                     364                       5                    3359 
## 
## $estatus
##   CANCELADO POR REASIGNAR EN PROCESO DE INSTALACIÓN 
##                         5                       254 
##              IMPLEMENTADO                 INSTALADO 
##                       100                       339 
##                 OPERACIÓN            VISITA FALLIDA 
##                      3002                        28
# Analysing by number of providers:
class(wifi[,"nombre_del_proveedor_mc"])
## [1] "factor"
# As the data type is the one required, to see the name of providers:
levels(wifi[,"nombre_del_proveedor_mc"])
## [1] "AXTEL"      "OPERBES"    "TELECOMM"   "TELMEX"     "TOTAL PLAY"
length(levels(wifi[,"nombre_del_proveedor_mc"]))
## [1] 5
# As a simple summary statistic, the frequencies of the levels
# of such a factor variable can be found with table command:
table(wifi[,"nombre_del_proveedor_mc"])
## 
##      AXTEL    OPERBES   TELECOMM     TELMEX TOTAL PLAY 
##        224        413          5          3       3083

Graphic of Descriptive Statistics:


As part of Exploratory Data Analysis, we need to generate some graphic output, in the next chunk, we generate a histogram with the providers and the services they provide to México Conectado.

# To Generate a graphic:
library(ggplot2)
tablewifi <- as.data.frame(table(wifi[,"nombre_del_proveedor_mc"]))
names(tablewifi) <- c("Providers", "Number")
# to see this analysis in a histogram:
wifiplot <- ggplot(data = tablewifi, aes(x = Providers, y = Number)) +
    geom_bar(stat = "identity", fill = "steelblue") +
    geom_text(aes(label=Number), vjust=-0.3, size=3.5) +
    labs(title = "Providers of México Conectado") +
    labs(x = "Company", y = "Number of Services") +
    theme_minimal()

wifiplot

Articles on Open Data in Mexico


In the Big Data section of Emprendedores Journal, appears an article in two sections:

Open Data en México 1a. Parte

Open Data en México 2a. Parte