Context

Team smashAvo has been gathering datasets to support research on Sydney Housing. Data vendor offers different data format e.g. csv, spreadsheet, pdf. Data vendor also offers various method to acquire data e.g. direct download, API e.g SDMX.

The team has collected 13 datasets. The most important dataset is raw housing sales data. This data is only available commercially. We are fortunate to be able to access this free for this research. However we can not retrieve data directly. It is supplied to us in multi-tab spreadsheet. There is also limit on how much data we can acquire.

Because of this constraint we are limiting this research by randomly selecting following LGAs representing Sydney Housing.

Via SDMX

  • D01 - Weekly Rental by Landlord
  • D02 - Tenure type and landlord type by dwelling structure
library(rsdmx)
d01_url <- "http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetData/ABS_CENSUS2011_T19_LGA/TOT+10+20+31_32+60+40_51_52+Z.TOT+01_02+03+04_05+06_07+08+09_10+11_13+14_17+18+19+20+Z.1+2+3+4+5+6+7+8+9.LGA2011.10200+10350+10750+13950+14000+14100+14150+14500+14700+15350+15950+16250+16700+18500.A/all?startTime=2001&endTime=2011"
d01_df <- as.data.frame(readSDMX(d01_url))
head(d01_df)
##   MEASURE RNTRD STATE REGIONTYPE REGION FREQUENCY TIME_FORMAT obsTime
## 1     TOT   TOT     1    LGA2011  10200         A         P1Y    2001
## 2     TOT   TOT     1    LGA2011  10200         A         P1Y    2006
## 3     TOT   TOT     1    LGA2011  10200         A         P1Y    2011
## 4     TOT 01_02     1    LGA2011  10200         A         P1Y    2001
## 5     TOT 01_02     1    LGA2011  10200         A         P1Y    2006
## 6     TOT 01_02     1    LGA2011  10200         A         P1Y    2011
##   obsValue
## 1     6076
## 2     7655
## 3     8718
## 4      632
## 5      471
## 6      319
library(rsdmx)
d02_url <- "http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetData/ABS_CENSUS2011_T18_LGA/TOT+OPD+1+2+3+9+Z.TOT+1+2+3_8+3+4+5+6+7+8+9+Z.1+2+3+4+5+6+7+8+9.LGA2011.10200+10350+10750+13950+14000+14100+14150+14500+14700+15350+15950+16250+16700+18500.A/all?startTime=2001&endTime=2011"
d02_df <- as.data.frame(readSDMX(d02_url))
head(d02_df)
##   MEASURE TENLLD STATE REGIONTYPE REGION FREQUENCY TIME_FORMAT obsTime
## 1     TOT    TOT     1    LGA2011  10200         A         P1Y    2001
## 2     TOT    TOT     1    LGA2011  10200         A         P1Y    2006
## 3     TOT    TOT     1    LGA2011  10200         A         P1Y    2011
## 4     TOT      3     1    LGA2011  10200         A         P1Y    2001
## 5     TOT      3     1    LGA2011  10200         A         P1Y    2006
## 6     TOT      3     1    LGA2011  10200         A         P1Y    2011
##   obsValue
## 1    17139
## 2    20786
## 3    23453
## 4     3610
## 5     5206
## 6     5912

Via read_excel

  • D03 - Dwelling Median Sale Prices
  • D04 - Median (weekly) Rent
  • D06 - NSW Population Projection Data
  • D07 - NSW Recorded Crime Statistics 2012 to 2016
  • D09 - All Property Price Index
  • D10 - Residential Property Index
  • D11 - Residential Property Index and percentage change
  • D12 - Socio-economic Indexes for Areas (SEIFA)
  • D13 - Sydney Housing Sale Price
  • D14 - LGA Postcode Mapping
  • D15 - Consumer Index Price

The dataset will be downloaded locally first.

d03_filename <- “D03 Sales_GMR_Trend_91q1-16q4.xlsx”
readxl::excel_sheets(d03_filename)
d03_df_ad <- readxl::read_excel(d03_filename, sheet = “ad” )

d04_filename <- “D04 Rent_GMR_Trend_90q1-17q1.xlsx”
readxl::excel_sheets(d04_filename)
d04_df_AllBR_A <- readxl::read_excel(d04_filename, sheet = “AllBR_A” )

Via fread - directly from website

  • D05 - NSW Government School locations and student enrollment number
library(data.table)
d05_df <- fread('https://data.cese.nsw.gov.au/data/dataset/027493b2-33ad-3f5b-8ed9-37cdca2b8650/resource/2ac19870-44f6-443d-a0c3-4c867f04c305/download/nswpublicschoolsmasterdataset.csv')
## Warning in fread("https://data.cese.nsw.gov.au/data/dataset/
## 027493b2-33ad-3f5b-8ed9-37cdca2b8650/resource/2ac19870-44f6-443d-
## a0c3-4c867f04c305/download/nswpublicschoolsmasterdataset.csv"): Bumped
## column 10 to type character on data row 347, field contains 'np'. Coercing
## previously read values in this column from logical, integer or numeric back
## to character which may not be lossless; e.g., if '00' and '000' occurred
## before they will now be just '0', and there may be inconsistencies with
## treatment of ',,' and ',NA,' too (if they occurred in this column before
## the bump). If this matters please rerun and set 'colClasses' to 'character'
## for this column. Please note that column type detection uses a sample of
## 1,000 rows (100 rows at 10 points) so hopefully this message should be very
## rare. If reporting to datatable-help, please rerun and include the output
## from verbose=TRUE.
head(d05_df)
##    School_code AgeID              School_name             Street
## 1:        1001  4402 Abbotsford Public School 350 Great North Rd
## 2:        1002  9333   Aberdeen Public School        Segenhoe St
## 3:        1003  9299   Abermain Public School        Goulburn St
## 4:        1007 10425  Adaminaby Public School  9 Cosgrove Street
## 5:        1008  9043  Adamstown Public School          Bryant St
## 6:        1009 10667    Adelong Public School         Gilmore St
##    Town_suburb Postcode     Phone                       School_Email
## 1:  Abbotsford     2046 9713 6220 abbotsford-p.school@det.nsw.edu.au
## 2:    Aberdeen     2336 6543 7271   aberdeen-p.school@det.nsw.edu.au
## 3:    Abermain     2326 4930 4210   abermain-p.school@det.nsw.edu.au
## 4:   Adaminaby     2629 6454 2265  adaminaby-p.school@det.nsw.edu.au
## 5:   Adamstown     2289 4957 1114  adamstown-p.school@det.nsw.edu.au
## 6:     Adelong     2729 6946 2053    adelong-p.school@det.nsw.edu.au
##          Fax Student_number Indigenous_pct LBOTE_pct ICSEA_value
## 1: 9712 1825          602.0             np      38.0        1117
## 2: 6543 7712          241.0           10.0       4.0         913
## 3: 4930 4319          264.0           22.0        np         901
## 4: 6454 2552           17.0            0.0       0.0         985
## 5: 4956 2446          263.0            5.0      11.0        1027
## 6: 6946 2199           81.0             np       0.0         965
##    Level_of_schooling Selective_school Opportunity_class
## 1:     Primary School    Not Selective                 N
## 2:     Primary School    Not Selective                 N
## 3:     Primary School    Not Selective                 N
## 4:     Primary School    Not Selective                 N
## 5:     Primary School    Not Selective                 N
## 6:     Primary School    Not Selective                 N
##    School_specialty_type   School_subtype Support_classes Preschool_ind
## 1:         Comprehensive Kinder to Year 6              NA             N
## 2:         Comprehensive Kinder to Year 6              NA             N
## 3:         Comprehensive Kinder to Year 6              NA             N
## 4:         Comprehensive Kinder to Year 6              NA             N
## 5:         Comprehensive Kinder to Year 6              NA             N
## 6:         Comprehensive Kinder to Year 6              NA             N
##    Distance_education Intensive_english_centre School_gender
## 1:                  N                        N          Coed
## 2:                  N                        N          Coed
## 3:                  N                        N          Coed
## 4:                  N                        N          Coed
## 5:                  N                        N          Coed
## 6:                  N                        N          Coed
##    Late_opening_school Date_1st_teacher                    LGA
## 1:                   N       1925-04-01         Canada Bay (A)
## 2:                   N       1864-02-01 Upper Hunter Shire (A)
## 3:                   N       1905-08-01           Cessnock (C)
## 4:                   N       1869-01-01        Snowy River (A)
## 5:                   N       1877-07-01          Newcastle (C)
## 6:                   N       1860-03-01        Tumut Shire (A)
##      Electorate Fed_electorate Operational_directorate Principal_network
## 1:    Drummoyne           Reid                  Ultimo       Strathfield
## 2: Upper Hunter    New England                Tamworth           Wollemi
## 3:     Cessnock       Paterson                Tamworth            Hunter
## 4:       Monaro    Eden-Monaro             Wagga Wagga        Queanbeyan
## 5:    Newcastle      Newcastle                Tamworth         Newcastle
## 6:  Wagga Wagga    Eden-Monaro             Wagga Wagga       Cootamundra
##                                     FACS_district Local_health_district
## 1: South Eastern Sydney, Northern Sydney & Sydney                Sydney
## 2:             Hunter New England & Central Coast    Hunter New England
## 3:             Hunter New England & Central Coast    Hunter New England
## 4:            Illawarra Shoalhaven & Southern NSW          Southern NSW
## 5:             Hunter New England & Central Coast    Hunter New England
## 6:           Murrumbidgee, Far West & Western NSW          Murrumbidgee
##          AECG_region           ASGS_remoteness  Latitude Longitude
## 1: Metropolitan East Major Cities of Australia -33.85273  151.1312
## 2:            Hunter  Inner Regional Australia -32.16610  150.8881
## 3:            Hunter Major Cities of Australia -32.80892  151.4265
## 4: Lower South Coast  Outer Regional Australia -35.99329  148.7767
## 5:            Hunter Major Cities of Australia -32.93221  151.7310
## 6:        Riverina 1  Inner Regional Australia -35.31233  148.0628
##             Assets unit Date_extracted
## 1:               Sydney     2017-09-24
## 2: Hunter/Central Coast     2017-09-24
## 3: Hunter/Central Coast     2017-09-24
## 4:         Southern NSW     2017-09-24
## 5: Hunter/Central Coast     2017-09-24
## 6:         Southern NSW     2017-09-24