Out[299]:
|
geolocation_zip_code_prefix |
geolocation_lat |
geolocation_lng |
geolocation_city |
geolocation_state |
| 0 |
01037 |
-23.545621 |
-46.639292 |
sao paulo |
SP |
| 1 |
01046 |
-23.546081 |
-46.644820 |
sao paulo |
SP |
| 2 |
01046 |
-23.546129 |
-46.642951 |
sao paulo |
SP |
| 3 |
01041 |
-23.544392 |
-46.639499 |
sao paulo |
SP |
| 4 |
01035 |
-23.541578 |
-46.641607 |
sao paulo |
SP |
| ... |
... |
... |
... |
... |
... |
| 1000158 |
99950 |
-28.068639 |
-52.010705 |
tapejara |
RS |
| 1000159 |
99900 |
-27.877125 |
-52.224882 |
getulio vargas |
RS |
| 1000160 |
99950 |
-28.071855 |
-52.014716 |
tapejara |
RS |
| 1000161 |
99980 |
-28.388932 |
-51.846871 |
david canabarro |
RS |
| 1000162 |
99950 |
-28.070104 |
-52.018658 |
tapejara |
RS |
1000163 rows × 5 columns
Out[301]:
|
geolocation_zip_code_prefix |
geolocation_lat |
geolocation_lng |
geolocation_city |
geolocation_state |
geolocation_zip_code_prefix_1_digits |
geolocation_zip_code_prefix_2_digits |
geolocation_zip_code_prefix_3_digits |
geolocation_zip_code_prefix_4_digits |
| 0 |
01037 |
-23.545621 |
-46.639292 |
sao paulo |
SP |
0 |
01 |
010 |
0103 |
| 1 |
01046 |
-23.546081 |
-46.644820 |
sao paulo |
SP |
0 |
01 |
010 |
0104 |
| 2 |
01046 |
-23.546129 |
-46.642951 |
sao paulo |
SP |
0 |
01 |
010 |
0104 |
| 3 |
01041 |
-23.544392 |
-46.639499 |
sao paulo |
SP |
0 |
01 |
010 |
0104 |
| 4 |
01035 |
-23.541578 |
-46.641607 |
sao paulo |
SP |
0 |
01 |
010 |
0103 |
| 5 |
01012 |
-23.547762 |
-46.635361 |
são paulo |
SP |
0 |
01 |
010 |
0101 |
| 6 |
01047 |
-23.546273 |
-46.641225 |
sao paulo |
SP |
0 |
01 |
010 |
0104 |
| 7 |
01013 |
-23.546923 |
-46.634264 |
sao paulo |
SP |
0 |
01 |
010 |
0101 |
| 8 |
01029 |
-23.543769 |
-46.634278 |
sao paulo |
SP |
0 |
01 |
010 |
0102 |
| 9 |
01011 |
-23.547640 |
-46.636032 |
sao paulo |
SP |
0 |
01 |
010 |
0101 |
There are 18.051 different zip_code_prefix. On average there are 52.6 coordinates for each prefix. But there is one prefix with 1146 coordinates availan;e/
Out[303]:
|
geolocation_zip_code_prefix |
| count |
19015.000000 |
| mean |
52.598633 |
| std |
72.057907 |
| min |
1.000000 |
| 25% |
10.000000 |
| 50% |
29.000000 |
| 75% |
66.500000 |
| max |
1146.000000 |
There are some outliers coordinates in the dataset that are outside of brazilian territory. Lets guarantee that all coordinates are within a rectangle delimited by the limits of Brazil.
Then we treat the latitude and longitude coordinates and transform them to Mercator x/y Coordinates/
Out[306]:
|
geolocation_zip_code_prefix |
geolocation_lat |
geolocation_lng |
geolocation_city |
geolocation_state |
geolocation_zip_code_prefix_1_digits |
geolocation_zip_code_prefix_2_digits |
geolocation_zip_code_prefix_3_digits |
geolocation_zip_code_prefix_4_digits |
x |
y |
| 0 |
01037 |
-23.545621 |
-46.639292 |
sao paulo |
SP |
0 |
01 |
010 |
0103 |
-5.191862e+06 |
-2.698137e+06 |
| 1 |
01046 |
-23.546081 |
-46.644820 |
sao paulo |
SP |
0 |
01 |
010 |
0104 |
-5.192478e+06 |
-2.698193e+06 |
| 2 |
01046 |
-23.546129 |
-46.642951 |
sao paulo |
SP |
0 |
01 |
010 |
0104 |
-5.192270e+06 |
-2.698199e+06 |
Finnally ploting the coordinates on a map. We see there is a relationship between the zip code prefix and the location of that zip code. They start in Sao Paulo, with prefix 01001, and then increase counterclockwise finishing in Rio Grande do Sul (south of Brazil) with prefix 99990.
Out[308]:
|
geolocation_zip_code_prefix |
| count |
1.000121e+06 |
| mean |
3.657332e+04 |
| std |
3.054939e+04 |
| min |
1.001000e+03 |
| 25% |
1.107500e+04 |
| 50% |
2.653000e+04 |
| 75% |
6.350400e+04 |
| max |
9.999000e+04 |
Out[311]:
'geolocation_zip_code_prefix'
WARNING:param.OverlayPlot103663: No plotting class for WMTS type and bokeh backend found.
Lets look at the state of Sao Paulo (SP) to see how zip code prefixes works in a regional level. We see that:
- zip code prefixes in Sao Paulo state ranges from 01001 to 19990
- zip codes starting with 0 are in the Sao Paulo meto region
- zip codes starting with 1 are in the interior of the state
Out[130]:
|
geolocation_zip_code_prefix |
geolocation_lat |
geolocation_lng |
geolocation_city |
geolocation_state |
geolocation_zip_code_prefix_1_digits |
geolocation_zip_code_prefix_2_digits |
geolocation_zip_code_prefix_3_digits |
geolocation_zip_code_prefix_4_digits |
x |
y |
| 0 |
1037 |
-23.545621 |
-46.639292 |
sao paulo |
SP |
0 |
1 |
10 |
103 |
-5.191862e+06 |
-2.698137e+06 |
| 1 |
1046 |
-23.546081 |
-46.644820 |
sao paulo |
SP |
0 |
1 |
10 |
104 |
-5.192478e+06 |
-2.698193e+06 |
Out[135]:
|
geolocation_zip_code_prefix |
| count |
402651.000000 |
| mean |
9030.831782 |
| std |
5101.941134 |
| min |
1001.000000 |
| 25% |
4661.000000 |
| 50% |
8257.000000 |
| 75% |
13330.000000 |
| max |
19990.000000 |
WARNING:param.OverlayPlot69111: No plotting class for WMTS type and bokeh backend found.
Zip Codes in Large Cities¶
Lets look at the city of Sao Paulo to see how zip code prefixes works in a city level. We see that:
- zip code prefixes in Sao Paulo city ranges from 01001 to 09540
- zip code prefixes are somehow related to neighborhoods or city districts
Out[158]:
|
geolocation_zip_code_prefix |
geolocation_lat |
geolocation_lng |
geolocation_city |
geolocation_state |
geolocation_zip_code_prefix_1_digits |
geolocation_zip_code_prefix_2_digits |
geolocation_zip_code_prefix_3_digits |
geolocation_zip_code_prefix_4_digits |
x |
y |
| 0 |
1037 |
-23.545621 |
-46.639292 |
sao paulo |
SP |
0 |
1 |
10 |
103 |
-5.191862e+06 |
-2.698137e+06 |
| 1 |
1046 |
-23.546081 |
-46.644820 |
sao paulo |
SP |
0 |
1 |
10 |
104 |
-5.192478e+06 |
-2.698193e+06 |
| 2 |
1046 |
-23.546129 |
-46.642951 |
sao paulo |
SP |
0 |
1 |
10 |
104 |
-5.192270e+06 |
-2.698199e+06 |
Out[159]:
|
geolocation_zip_code_prefix |
| count |
135256.000000 |
| mean |
4137.454996 |
| std |
1804.189554 |
| min |
1001.000000 |
| 25% |
2941.000000 |
| 50% |
4166.000000 |
| 75% |
5158.000000 |
| max |
9540.000000 |
WARNING:param.OverlayPlot71685: No plotting class for WMTS type and bokeh backend found.
Zip Codes in Small Cities¶
Lets look at the city of Atibaia to see how zip code prefixes works in a city level. We see that:
- zip code prefix of Atibaia city is between 12940 and 12954
- but there are other neighbor cities with the same zip code prefix
- to have more detail and go down to a city level we would probably need more zip code digits (the 4th and 5th digit)
Out[164]:
|
geolocation_zip_code_prefix |
| count |
1254.000000 |
| mean |
12945.042265 |
| std |
4.197704 |
| min |
12940.000000 |
| 25% |
12942.000000 |
| 50% |
12944.000000 |
| 75% |
12948.000000 |
| max |
12954.000000 |
WARNING:param.OverlayPlot72543: No plotting class for WMTS type and bokeh backend found.
Abrangence of zip code digits¶
What does every digit in the zip code means? Lets zee how it behave, from 1 to 5 digits.
Zip codes starting with 2 are all from Rio de Janeiro (RJ) and Espírito Santo (ES) States
Zip codes starting with 22 are all within the Rio de Janeiro City, but they represent multiple neighborhoods
At the from right to left you see neighborhoods such as Leme, Copacabana, Ipanema, Leblon and Barra da Tijuca.
Zip codes starting with 220 are all within Copacabana, a neighborhood of Rio de Janeiro
Zip codes starting with 2201 represent some streets of Copacabana
Zip codes starting with 22010 represent a more strict set of streets of Copacabana
So, now it should be clear to you that as we increase the quantity of numbers on zip codes we are selecting subsets of data.
Where does most revenue comes from?¶
Working with 5 digits of zip codes might lead us to very small samples in some areas, so we will work with 3 digits. Plotting the sum of products value grouped by zip code prefix we see that most of the revenue came from the Southeast and South regions of Brazil. It is also possible to see that large cities and capitals, where population is bigger, have larger participation on revenue.
Out[210]:
|
geolocation_zip_code_prefix |
geolocation_lat |
geolocation_lng |
geolocation_city |
geolocation_state |
geolocation_zip_code_prefix_1_digits |
geolocation_zip_code_prefix_2_digits |
geolocation_zip_code_prefix_4_digits |
x |
y |
price |
| 10 |
1037 |
-23.545621 |
-46.639292 |
sao paulo |
SP |
0 |
1 |
103 |
-5.191862e+06 |
-2.698137e+06 |
24073.51 |
| 10 |
1046 |
-23.546081 |
-46.644820 |
sao paulo |
SP |
0 |
1 |
104 |
-5.192478e+06 |
-2.698193e+06 |
24073.51 |
| 10 |
1046 |
-23.546129 |
-46.642951 |
sao paulo |
SP |
0 |
1 |
104 |
-5.192270e+06 |
-2.698199e+06 |
24073.51 |
| 10 |
1041 |
-23.544392 |
-46.639499 |
sao paulo |
SP |
0 |
1 |
104 |
-5.191885e+06 |
-2.697988e+06 |
24073.51 |
| 10 |
1035 |
-23.541578 |
-46.641607 |
sao paulo |
SP |
0 |
1 |
103 |
-5.192120e+06 |
-2.697646e+06 |
24073.51 |
| ... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
| 999 |
99950 |
-28.068639 |
-52.010705 |
tapejara |
RS |
9 |
99 |
9995 |
-5.789805e+06 |
-3.257630e+06 |
4186.04 |
| 999 |
99900 |
-27.877125 |
-52.224882 |
getulio vargas |
RS |
9 |
99 |
9990 |
-5.813647e+06 |
-3.233491e+06 |
4186.04 |
| 999 |
99950 |
-28.071855 |
-52.014716 |
tapejara |
RS |
9 |
99 |
9995 |
-5.790252e+06 |
-3.258036e+06 |
4186.04 |
| 999 |
99980 |
-28.388932 |
-51.846871 |
david canabarro |
RS |
9 |
99 |
9998 |
-5.771567e+06 |
-3.298098e+06 |
4186.04 |
| 999 |
99950 |
-28.070104 |
-52.018658 |
tapejara |
RS |
9 |
99 |
9995 |
-5.790690e+06 |
-3.257815e+06 |
4186.04 |
1000121 rows × 11 columns
WARNING:param.OverlayPlot73401: No plotting class for WMTS type and bokeh backend found.
What is the Average Ticket?¶
Here we see something somehow unexpected. Customers of the south and southeast regions of Brazil have lower average ticket, than their peers on north and norteast. This might happen because they have to pay more for freight (let's check that in a moment)
WARNING:param.OverlayPlot74259: No plotting class for WMTS type and bokeh backend found.
Who pays more for transportation?¶
We might find a freight ratio by dividing the freight value by the order value. This ratio indicates the percentage of the product price that a person had to pay just to get their order delivered. For example, if a product costs R$50.00 and the freight value was R$10.00, then the freight ratio is 0.2 or 20%. Higher freight ratios are very likely to discourage customers to complete a purchase. Due to logistics costs, we expect to see lower freight ratios in densely populated areas and are higher freight ratios on sparsely poulated regions.
WARNING:param.OverlayPlot75117: No plotting class for WMTS type and bokeh backend found.
Unfortunately, who lives in the north and northeast of Brazil has to bear with higher freight costs and has to wait longer to receive their purchase.
WARNING:param.OverlayPlot75975: No plotting class for WMTS type and bokeh backend found.
Lets look to delivery times at a state level. We filtered only orders from Parana state (PR). It is possible to see that larger cities have lower average delivery times than the smaller ones.
WARNING:param.OverlayPlot81123: No plotting class for WMTS type and bokeh backend found.
Interesting Point About Brazilian Suburbs¶
Unlike other countries, in Brazil the richest areas usualy are near downtow and suburbs are know for poverty and high violence rates. Lets explore that in Rio the Janeiro.
WARNING:param.OverlayPlot92297: No plotting class for WMTS type and bokeh backend found.
It turns out that if you live in rich neighborhoods such as Downtown, Botafogo, Copacabana and Flamengo you are likey to receive your order five days earlier than someone who lives in a poor neighborhood such as Cidade de Deus or Bangu. We see the same pattern in Sao Paulo and Porto Alegre, customers near downtown receive their orders faster than those who lives on suburbs.
WARNING:param.OverlayPlot83697: No plotting class for WMTS type and bokeh backend found.
WARNING:param.OverlayPlot84555: No plotting class for WMTS type and bokeh backend found.
Orders Average Review Score¶
Customers of Rio de Janeiro State and Northeast Region are more likely to give low scores on purchases.
WARNING:param.OverlayPlot86271: No plotting class for WMTS type and bokeh backend found.
Let's look at the average review score in Rio de Janeiro? Do you see any relationship with average delivery time?
WARNING:param.OverlayPlot87987: No plotting class for WMTS type and bokeh backend found.
Lets measure the percentage of delayed orders and see how that relate to the score given by customers?
WARNING:param.OverlayPlot88845: No plotting class for WMTS type and bokeh backend found.
Looking again to Rio de Janeiro to view where order deliveries are more prone to be delayed.
WARNING:param.OverlayPlot89703: No plotting class for WMTS type and bokeh backend found.
Does customers from small towns buy more items on a single order¶
Lets check the average item quantity per order to see if customers that are further out of large cities buy in bulk.
WARNING:param.OverlayPlot91142: No plotting class for WMTS type and bokeh backend found.
Apparently that hypothesis is false. Looks like that customers from large cities buy more products per order. But there isn't a clear geographical pattern that would allow us to draw any further conclusion.
Work in progress...
To do:
Which categories are most sold.
Wich payment method was chosen.
How many installments.
Analysis on specific cities, such as Sao Paulo, Porto Alegre, Curitiba, Fortaleza, Bahia, Brasilia.
Any sugestion?