Phase 3: Data Preparation

Combining and structuring all relevant data for modelling

The Analytical Base Table

## ✓ Proximity to Main Road calculated.
## Searching for transit nodes (bus stops/stations and taxi ranks)...
## ✓ Transit nodes used: 21
## ✓ Direct Competition count calculated.
## ✓ Indirect Competition count calculated.
## Added demographics: 40 / 40 rows filled for population density.
## 
## --- Final Analytical Base Table (First 10 Rows) ---
##    Site_ID                                                        Description
## 1        1 5 Tongaat, Plankenbrug (Industrial building, ground 1 & 1st floor)
## 2        2                    1 Webersvallei Rd, Jamestown (Office/Warehouse)
## 3        3                   6 Lappan Rd, Tennantville (Industrial, 2 titles)
## 4        4                6 Lappan Rd, Tennantville (Industrial, single unit)
## 5        5                 Cnr Route 44 & Annandale Rd, Stellenbosch (Root44)
## 6        6               5 Tongaat St, Kayamandi (Industrial/Commercial unit)
## 7        7      1 La Rez, 226 Bird St, Stellenbosch Central (Commercial unit)
## 8        8   27 Tarentaal Rd, Papegaaiberg Ind. Park (Retirement/Vacant unit)
## 9        9                                                Mount Vernon Estate
## 10      10                                        Klapmuts (Residential site)
##    Size_m2  Latitude Longitude Drive_Thru_Possible In_Line_Possible
## 1      250 -33.93870  18.84500                   0                1
## 2      413 -33.98360  18.83530                   0                1
## 3     1050 -33.93755  18.85960                   0                1
## 4      525 -33.93755  18.85960                   0                1
## 5     4830 -33.99625  18.82865                   1                1
## 6      330 -33.91670  18.85000                   0                1
## 7     1105 -33.93290  18.86120                   0                1
## 8      120 -33.93620  18.83480                   0                1
## 9      797 -33.83070  18.82590                   0                1
## 10     803 -33.81056  18.86257                   0                1
##    Proximity_Main_Road_m Proximity_Transit_Node_m Direct_Competition_1km
## 1                    344                      417                     11
## 2                   1136                     5115                      0
## 3                      0                      518                     15
## 4                      0                      518                     15
## 5                   2663                     6633                      0
## 6                     71                      703                      0
## 7                    103                      246                     17
## 8                    702                     1394                      0
## 9                   4932                     8776                      0
## 10                  6811                    10735                      0
##    Indirect_Competition_1km                       Local_Area_source
## 1                        12       Cloetesville SP (Sub Place, 2011)
## 2                         0       Paradyskloof SP (Sub Place, 2011)
## 3                       104         Tennantville (Main Place, 2011)
## 4                       104         Tennantville (Main Place, 2011)
## 5                         0       Stellenbosch SP (Sub Place, 2011)
## 6                         5         Khayamandi SP (Sub Place, 2011)
## 7                       100       Stellenbosch SP (Sub Place, 2011)
## 8                         0 Onder Papegaaiberg SP (Sub Place, 2011)
## 9                         0       Stellenbosch SP (Sub Place, 2011)
## 10                        0             Klapmuts (Main Place, 2011)
##    Population_Density_people_km2 Avg_Household_Income_R_yr
## 1                        8860.18                     14600
## 2                        1255.21                    230700
## 3                        4934.65                     29400
## 4                        4934.65                     29400
## 5                        2613.52                    230700
## 6                       15968.35                     14600
## 7                        2613.52                    225000
## 8                         846.22                    225000
## 9                        2613.52                     29400
## 10                       4364.83                     29400
##    Income_basis_ward_muni
## 1                  Ward12
## 2                  Ward21
## 3    Municipality (WC024)
## 4    Municipality (WC024)
## 5                  Ward21
## 6                  Ward12
## 7                  Ward22
## 8                  Ward22
## 9    Municipality (WC024)
## 10                  Ward8
## 
## --- Summary of New Features ---
##  Drive_Thru_Possible In_Line_Possible Proximity_Main_Road_m
##  Min.   :0.000       Min.   :1        Min.   :   0         
##  1st Qu.:0.000       1st Qu.:1        1st Qu.:  48         
##  Median :0.000       Median :1        Median : 249         
##  Mean   :0.025       Mean   :1        Mean   :1375         
##  3rd Qu.:0.000       3rd Qu.:1        3rd Qu.:1518         
##  Max.   :1.000       Max.   :1        Max.   :6811         
##  Proximity_Transit_Node_m Direct_Competition_1km Indirect_Competition_1km
##  Min.   :  246.0          Min.   : 0.00          Min.   :  0.00          
##  1st Qu.:  497.8          1st Qu.: 0.00          1st Qu.:  0.00          
##  Median : 2970.0          Median : 0.00          Median :  2.00          
##  Mean   : 3254.3          Mean   : 4.85          Mean   : 24.88          
##  3rd Qu.: 5347.8          3rd Qu.:12.00          3rd Qu.: 11.25          
##  Max.   :10735.0          Max.   :22.00          Max.   :105.00          
##  Local_Area_source  Population_Density_people_km2 Avg_Household_Income_R_yr
##  Length:40          Min.   :  846.2               Min.   : 14600           
##  Class :character   1st Qu.: 1255.2               1st Qu.: 29400           
##  Mode  :character   Median : 2613.5               Median : 29400           
##                     Mean   : 3109.7               Mean   :110240           
##                     3rd Qu.: 2613.5               3rd Qu.:230700           
##                     Max.   :15968.4               Max.   :230700           
##  Income_basis_ward_muni
##  Length:40             
##  Class :character      
##  Mode  :character      
##                        
##                        
## 

The Data Quality Report

Data Quality Report — Continuous Features
Feature Count % Miss. Card. Min 1st Qrt. Mean Median 3rd Qrt. Max Std. Dev.
Avg_Household_Income_R_yr 40 0 5 14600.00000 29400.00000 110240.00000 29400.00000 230700.00000 230700.00000 9.918033e+04
Direct_Competition_1km 40 0 7 0.00000 0.00000 4.85000 0.00000 12.00000 22.00000 7.091165e+00
Drive_Thru_Possible 40 0 2 0.00000 0.00000 0.02500 0.00000 0.00000 1.00000 1.581139e-01
In_Line_Possible 40 0 1 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 0.000000e+00
Indirect_Competition_1km 40 0 9 0.00000 0.00000 24.87500 2.00000 11.25000 105.00000 4.309155e+01
Latitude 40 0 26 -33.99625 -33.96550 -33.93297 -33.93714 -33.92695 -33.81056 4.301900e-02
Longitude 40 0 26 18.78590 18.83517 18.85247 18.85657 18.86257 18.96990 3.560570e-02
Population_Density_people_km2 40 0 9 846.22000 1255.21000 3109.67500 2613.52000 2613.52000 15968.35000 2.720272e+03
Proximity_Main_Road_m 40 0 26 0.00000 48.00000 1375.15000 249.00000 1517.75000 6811.00000 2.051770e+03
Proximity_Transit_Node_m 40 0 26 246.00000 497.75000 3254.32500 2970.00000 5347.75000 10735.00000 3.103057e+03
Site_ID 40 0 40 1.00000 10.75000 20.50000 20.50000 30.25000 40.00000 1.169045e+01
Size_m2 40 0 36 120.00000 257.50000 563.57500 327.50000 583.00000 4830.00000 7.591316e+02
Data Quality Report — Categorical Features
Feature Count % Miss. Card. Mode Mode Freq. Mode % 2nd Mode 2nd Mode Freq. 2nd Mode %
Description 40 0 38 Longlands Estate 3 7.5 1 La Rez, 226 Bird St, Stellenbosch Central (Commercial unit) 1 2.5
Income_basis_ward_muni 40 0 6 Municipality (WC024) 15 37.5 Ward21 11 27.5
Local_Area_source 40 0 9 Stellenbosch SP (Sub Place, 2011) 21 52.5 Paradyskloof SP (Sub Place, 2011) 9 22.5