Boundaries Data EDA

Author

Nissim Lebovits

Published

June 2, 2023

Overview

This document contains summary statistics and exploratory data analysis for the new boundaries dataset produced by Henry Feinstein in early summer of 2023. There are three datasets: the full spatial dataset containing EPIC data, Census variables assosciated with each PWSID, and distance variables associated with each PWSID.

Full Dataset

Index Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing No_of_distinct_values mean median var
1 pwsid character 6573 0 0.000 6573 NA NA NA
2 pws_name character 6573 0 0.000 6525 NA NA NA
3 primacy_agency_code character 6573 0 0.000 13 NA NA NA
4 state_code character 6573 0 0.000 13 NA NA NA
5 city_served character 3058 3515 0.535 1320 NA NA NA
6 county_served character 6553 20 0.003 397 NA NA NA
7 population_served_count numeric 6573 0 0.000 3023 9445.21 582.00 4788165975.74
8 service_connections_count numeric 6573 0 0.000 2374 3194.50 222.00 508182302.65
9 service_area_type_code character 6573 0 0.000 315 NA NA NA
10 owner_type_code character 6573 0 0.000 5 NA NA NA
11 is_wholesaler_ind logical 6573 0 0.000 2 NA NA NA
12 primacy_type character 6573 0 0.000 1 NA NA NA
13 primary_source_code character 6573 0 0.000 5 NA NA NA
14 tier numeric 6573 0 0.000 3 2.11 2.00 0.81
15 geometry_lat numeric 6573 0 0.000 4929 31.95 30.67 12.96
16 geometry_long numeric 6573 0 0.000 4930 -85.69 -82.46 54.71
17 geometry_quality character 6552 21 0.003 22 NA NA NA
18 geometry_source_detail character 6048 525 0.080 15 NA NA NA
19 pred_05 numeric 3068 3505 0.533 1559 777.78 326.41 1452779.27
20 pred_50 numeric 3068 3505 0.533 1559 846.42 349.91 1780649.93
21 pred_95 numeric 3068 3505 0.533 1559 925.25 371.82 2226909.77
22 relocated_centroid numeric 3068 3505 0.533 2 0.22 0.00 0.17

$bar_plots
$bar_plots[[1]]


$bar_plots[[2]]


$bar_plots[[3]]


$bar_plots[[4]]


$bar_plots[[5]]


$bar_plots[[6]]



$rotated_bar_plots
$rotated_bar_plots[[1]]


$rotated_bar_plots[[2]]



$histograms
$histograms[[1]]


$histograms[[2]]


$histograms[[3]]


$histograms[[4]]


$histograms[[5]]



$polygon_plot

Census Variables

Index Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing No_of_distinct_values mean median var
1 pwsid character 6573 0 0.000 6573 NA NA NA
2 TotPop numeric 6573 0 0.000 6484 7517.38 116.76 2904290771.21
3 TotHH numeric 6573 0 0.000 6476 2746.08 42.05 384678271.04
4 MedInc numeric 6413 160 0.024 4510 60228.76 54254.64 656460098.54
5 HHBurden numeric 6188 385 0.059 4373 19.53 19.93 33.72
6 PctMinority numeric 6472 101 0.015 4605 0.38 0.35 0.05
7 MedYrBuilt numeric 6345 228 0.035 2949 1978.29 1987.39 14138.93
8 PctNoPlumbing numeric 6446 127 0.019 4039 0.03 0.02 0.00

[[1]]


[[2]]


[[3]]


[[4]]


[[5]]


[[6]]


[[7]]

Distance Variables

Index Variable_Name Variable_Type Sample_n Missing_Count Per_of_Missing No_of_distinct_values mean median var
1 pwsid character 6573 0 0 6573 NA NA NA
2 utilcount_20m integer 6573 0 0 9 19.11 3.00 592.07
3 uphill_pct numeric 6570 3 0 91 0.55 0.67 0.21
4 closest_util character 6573 0 0 128 NA NA NA

[[1]]


[[2]]