Setup

Introduction

The data powering Neighborhood Data for Social Change is a single dataset https://usc.data.socrata.com/Los-Angeles/Citizen-Connect-Variables/u7m9-48qx/data with all variables that contain counts and other numeric data. Here is a demo on how to publish https://usc.data.socrata.com/Los-Angeles/Rent-Price/4a97-v5tx to Citizen Connect.

Motivations for Script

  • Verify that an individual dataset contains consistent data already in Citizen Connect
    • Policy Area
    • Dataset
    • Variable
    • Tract Name
  • Join on Denominators if needed so that any variables require weighting Citizen Connect can perform that calculation
  • Create backup files of data
  • Provide publishing capability
    • DataSync Delete & Upsert files
    • API Upsert (& Delete if available)

Getting Started with R

Load the Libraries

If you need to install any of these packages run - install.packages("RSocrata")

Citizen Connect Data

##                 Policy_Area                   Dataset      
##  Demography           :352284   Jobs & Industries :127620  
##  Employment & Income  :169615   Race and Ethnicity:112496  
##  Housing & Real Estate: 74695   Age Distribution  : 97867  
##  Transportation       : 53529   Households        : 65223  
##  Social Connectedness : 12385   Immigration       : 51564  
##  Environment          :  7015   Poverty           : 28124  
##  (Other)              :  7028   (Other)           :193657  
##                               Variable          Name          
##  Total                            : 15762   Length:676551     
##  American Indian/Native Population: 14062   Class :character  
##  Asian Population                 : 14062   Mode  :character  
##  Black Population                 : 14062                     
##  Hispanic Population              : 14062                     
##  Homeownership Rate               : 14062                     
##  (Other)                          :590479                     
##       Date               geoid               Tract       
##  Min.   :2009-01-01   Length:676551      Min.   :101110  
##  1st Qu.:2010-01-01   Class :character   1st Qu.:211701  
##  Median :2012-01-01   Mode  :character   Median :404803  
##  Mean   :2012-02-29                      Mean   :406429  
##  3rd Qu.:2014-01-01                      3rd Qu.:552200  
##  Max.   :2016-01-01                      Max.   :990300  
##                                                          
##      Value            Denominator        Denominator_Description
##  Min.   :     0.00   Min.   :0.000e+00   Length:676551          
##  1st Qu.:    29.94   1st Qu.:1.571e+03   Class :character       
##  Median :   214.00   Median :3.173e+03   Mode  :character       
##  Mean   :  1846.46   Mean   :5.374e+04                          
##  3rd Qu.:   771.00   3rd Qu.:4.611e+03                          
##  Max.   :250001.00   Max.   :1.029e+09                          
##  NA's   :2855        NA's   :53925                              
##     Row_ID           Location 1           Latitude       Longitude     
##  Length:676551      Length:676551      Min.   :33.14   Min.   :-118.9  
##  Class :character   Class :character   1st Qu.:33.96   1st Qu.:-118.4  
##  Mode  :character   Mode  :character   Median :34.05   Median :-118.3  
##                                        Mean   :34.06   Mean   :-118.3  
##                                        3rd Qu.:34.15   3rd Qu.:-118.1  
##                                        Max.   :34.79   Max.   :-117.7  
##                                        NA's   :4688    NA's   :4688

Review the Metadata to ensure we are looking at right data and completeness

## [1] "Name:Rent Burden"
## [1] "Description:The percentage of renters paying more than 30 percent of their monthly income on rent and utilities"
## [1] "Last Updated:2017-10-10T00:40:53+0000"
## [1] "Source:Census"
## [1] "Source URL:https://factfinder.census.gov"

New Dataset

##                 Policy_Area           Dataset     
##  Housing & Real Estate:14062   Rent Burden:14062  
##                                                   
##                                                   
##                                                   
##                                                   
##                                                   
##                      Variable          Year          Count       
##  Rent-Burdened Population:14062   Min.   :2010   Min.   :   0.0  
##                                   1st Qu.:2011   1st Qu.: 178.0  
##                                   Median :2013   Median : 365.0  
##                                   Mean   :2013   Mean   : 407.8  
##                                   3rd Qu.:2014   3rd Qu.: 575.0  
##                                   Max.   :2015   Max.   :2584.0  
##    Percent             Tract            Tract_Number    Neighborhood      
##  Length:14062       Length:14062       Min.   :101110   Length:14062      
##  Class :character   Class :character   1st Qu.:211500   Class :character  
##  Mode  :character   Mode  :character   Median :404803   Mode  :character  
##                                        Mean   :407607                     
##                                        3rd Qu.:552602                     
##                                        Max.   :990300                     
##     GEOID             Location            Latitude       Longitude     
##  Length:14062       Length:14062       Min.   :33.14   Min.   :-118.9  
##  Class :character   Class :character   1st Qu.:33.96   1st Qu.:-118.4  
##  Mode  :character   Mode  :character   Median :34.05   Median :-118.3  
##                                        Mean   :34.07   Mean   :-118.3  
##                                        3rd Qu.:34.15   3rd Qu.:-118.1  
##                                        Max.   :34.79   Max.   :-117.7  
##     Row_ID               Date           
##  Length:14062       Min.   :2010-01-01  
##  Class :character   1st Qu.:2011-01-01  
##  Mode  :character   Median :2013-01-01  
##                     Mean   :2012-07-02  
##                     3rd Qu.:2014-01-01  
##                     Max.   :2015-01-01

Conditions for Publishing to Citizen Connect

Verify the New data’s Policy Area, Dataset and Variable

Inconsistent spelling and case within a dataset will cause additional values to appear in Citizen Connect

## [1] "Policy Area and Datset values look good!"

Visualize the Variables to detect anomolies

Inspect the data visually.

Check that we have at least 1 Numeric Count, Amount, Score or Value in the Dataset

Percent valuess do not aggregate correctly across neighborhoods, so datasets with only those values should be omitted from Citizen Connect.

## [1] "Dataset values look good! There is at least one numeric value."

Filter the Citizen Connect data on the New data’s Policy Area, Dataset and Variable

##  Policy_Area          Dataset                              Variable    
##  Length:14062       Length:14062       Rent-Burdened Population:14062  
##  Class :character   Class :character   18 to 24 Years          :    0  
##  Mode  :character   Mode  :character   25 to 34 Years          :    0  
##                                        35 to 44 Years          :    0  
##                                        45 to 54 Years          :    0  
##                                        55 to 64 Years          :    0  
##                                        (Other)                 :    0  
##      Name                Date               geoid          
##  Length:14062       Min.   :2010-01-01   Length:14062      
##  Class :character   1st Qu.:2011-01-01   Class :character  
##  Mode  :character   Median :2013-01-01   Mode  :character  
##                     Mean   :2012-07-02                     
##                     3rd Qu.:2014-01-01                     
##                     Max.   :2015-01-01                     
##                                                            
##      Tract            Value         Denominator    
##  Min.   :101110   Min.   :   0.0   Min.   :   0.0  
##  1st Qu.:211500   1st Qu.: 178.0   1st Qu.: 336.0  
##  Median :404803   Median : 365.0   Median : 641.0  
##  Mean   :407607   Mean   : 407.8   Mean   : 729.4  
##  3rd Qu.:552602   3rd Qu.: 575.0   3rd Qu.:1006.0  
##  Max.   :990300   Max.   :2584.0   Max.   :5379.0  
##                                                    
##  Denominator_Description    Row_ID           Location 1       
##  Length:14062            Length:14062       Length:14062      
##  Class :character        Class :character   Class :character  
##  Mode  :character        Mode  :character   Mode  :character  
##                                                               
##                                                               
##                                                               
##                                                               
##     Latitude       Longitude     
##  Min.   :33.14   Min.   :-118.9  
##  1st Qu.:33.96   1st Qu.:-118.4  
##  Median :34.05   Median :-118.3  
##  Mean   :34.07   Mean   :-118.3  
##  3rd Qu.:34.15   3rd Qu.:-118.1  
##  Max.   :34.79   Max.   :-117.7  
## 
## [1] "/Users/alicia.brown/data/ndsc/rent_burden_citizen_connect_delete.csv successfully created and ready for DataSync DELETED if needed."
## [1] "Dataset counts match."
## [1] "Rent Burden exists in Citizen Connect!"

Compare Variables between Citizen Connect Data and new dataset

Visually inspect that variable values are the same between datasets

Check that expected years of data are present

Visually inspect the span of data over time to check that counts per year match expected values

Tract Name Check

In Citizen Connect, the tract name caused text wrapping when “,Los Angeles County, California” was included, so that part was dropped from dataset and should be removed from individual datasets and the Tract Lookup dataset, https://usc.data.socrata.com/dataset/Census-Tract-Locations/atat-mmad/data.

## [1] "Need to drop county and state from Tract name."

Remove County and State from Tract Name if needed

Compare Tract Names
Tract Name
Census Tract 1011.10, Los Angeles County, California Census Tract 1011.10
Census Tract 1011.22, Los Angeles County, California Census Tract 1011.22
Census Tract 1012.10, Los Angeles County, California Census Tract 1012.10
Census Tract 1012.20, Los Angeles County, California Census Tract 1012.20
Census Tract 1013, Los Angeles County, California Census Tract 1013
Census Tract 1014, Los Angeles County, California Census Tract 1014
Census Tract 1021.03, Los Angeles County, California Census Tract 1021.03
Census Tract 1021.04, Los Angeles County, California Census Tract 1021.04
Census Tract 1021.05, Los Angeles County, California Census Tract 1021.05
Census Tract 1021.07, Los Angeles County, California Census Tract 1021.07

Denominators

If there should be a denominator, will need to join on https://usc.data.socrata.com/dataset/Variable-Denominators/xmps-ks6t/data by GEOID and Year to get the value to include with Citizen Connect. Currently checking the NDSC List of Variables for guidance on whether a Denominator should be added to a Variable.

Finalize Data

##                 Policy_Area           Dataset     
##  Housing & Real Estate:14062   Rent Burden:14062  
##                                                   
##                                                   
##                                                   
##                                                   
##                                                   
##                      Variable         Value          Percent         
##  Rent-Burdened Population:14062   Min.   :   0.0   Length:14062      
##                                   1st Qu.: 178.0   Class :character  
##                                   Median : 365.0   Mode  :character  
##                                   Mean   : 407.8                     
##                                   3rd Qu.: 575.0                     
##                                   Max.   :2584.0                     
##      Tract           GEOID            Location 1           Latitude    
##  Min.   :101110   Length:14062       Length:14062       Min.   :33.14  
##  1st Qu.:211500   Class :character   Class :character   1st Qu.:33.96  
##  Median :404803   Mode  :character   Mode  :character   Median :34.05  
##  Mean   :407607                                         Mean   :34.07  
##  3rd Qu.:552602                                         3rd Qu.:34.15  
##  Max.   :990300                                         Max.   :34.79  
##    Longitude         Row_ID               Date           
##  Min.   :-118.9   Length:14062       Min.   :2010-01-01  
##  1st Qu.:-118.4   Class :character   1st Qu.:2011-01-01  
##  Median :-118.3   Mode  :character   Median :2013-01-01  
##  Mean   :-118.3                      Mean   :2012-07-02  
##  3rd Qu.:-118.1                      3rd Qu.:2014-01-01  
##  Max.   :-117.7                      Max.   :2015-01-01  
##      Name           Denominator Description  Denominator    
##  Length:14062       Length:14062            Min.   :   0.0  
##  Class :character   Class :character        1st Qu.: 336.0  
##  Mode  :character   Mode  :character        Median : 641.0  
##                                             Mean   : 729.4  
##                                             3rd Qu.:1006.0  
##                                             Max.   :5379.0
## [1] "/Users/alicia.brown/data/ndsc/rent_burden_citizen_connect_upsert.csv successfully created and ready for DataSync UPSERT."

Publish the data