Overview
This document contains summary statistics and exploratory data analysis for the new boundaries dataset produced by Henry Feinstein in early summer of 2023. There are three datasets: the full spatial dataset containing EPIC data, Census variables assosciated with each PWSID, and distance variables associated with each PWSID.
Full Dataset
| Index |
Variable_Name |
Variable_Type |
Sample_n |
Missing_Count |
Per_of_Missing |
No_of_distinct_values |
mean |
median |
var |
| 1 |
pwsid |
character |
6573 |
0 |
0.000 |
6573 |
NA |
NA |
NA |
| 2 |
pws_name |
character |
6573 |
0 |
0.000 |
6525 |
NA |
NA |
NA |
| 3 |
primacy_agency_code |
character |
6573 |
0 |
0.000 |
13 |
NA |
NA |
NA |
| 4 |
state_code |
character |
6573 |
0 |
0.000 |
13 |
NA |
NA |
NA |
| 5 |
city_served |
character |
3058 |
3515 |
0.535 |
1320 |
NA |
NA |
NA |
| 6 |
county_served |
character |
6553 |
20 |
0.003 |
397 |
NA |
NA |
NA |
| 7 |
population_served_count |
numeric |
6573 |
0 |
0.000 |
3023 |
9445.21 |
582.00 |
4788165975.74 |
| 8 |
service_connections_count |
numeric |
6573 |
0 |
0.000 |
2374 |
3194.50 |
222.00 |
508182302.65 |
| 9 |
service_area_type_code |
character |
6573 |
0 |
0.000 |
315 |
NA |
NA |
NA |
| 10 |
owner_type_code |
character |
6573 |
0 |
0.000 |
5 |
NA |
NA |
NA |
| 11 |
is_wholesaler_ind |
logical |
6573 |
0 |
0.000 |
2 |
NA |
NA |
NA |
| 12 |
primacy_type |
character |
6573 |
0 |
0.000 |
1 |
NA |
NA |
NA |
| 13 |
primary_source_code |
character |
6573 |
0 |
0.000 |
5 |
NA |
NA |
NA |
| 14 |
tier |
numeric |
6573 |
0 |
0.000 |
3 |
2.11 |
2.00 |
0.81 |
| 15 |
geometry_lat |
numeric |
6573 |
0 |
0.000 |
4929 |
31.95 |
30.67 |
12.96 |
| 16 |
geometry_long |
numeric |
6573 |
0 |
0.000 |
4930 |
-85.69 |
-82.46 |
54.71 |
| 17 |
geometry_quality |
character |
6552 |
21 |
0.003 |
22 |
NA |
NA |
NA |
| 18 |
geometry_source_detail |
character |
6048 |
525 |
0.080 |
15 |
NA |
NA |
NA |
| 19 |
pred_05 |
numeric |
3068 |
3505 |
0.533 |
1559 |
777.78 |
326.41 |
1452779.27 |
| 20 |
pred_50 |
numeric |
3068 |
3505 |
0.533 |
1559 |
846.42 |
349.91 |
1780649.93 |
| 21 |
pred_95 |
numeric |
3068 |
3505 |
0.533 |
1559 |
925.25 |
371.82 |
2226909.77 |
| 22 |
relocated_centroid |
numeric |
3068 |
3505 |
0.533 |
2 |
0.22 |
0.00 |
0.17 |
$bar_plots
$bar_plots[[1]]
$rotated_bar_plots
$rotated_bar_plots[[1]]
$histograms
$histograms[[1]]
Census Variables
| Index |
Variable_Name |
Variable_Type |
Sample_n |
Missing_Count |
Per_of_Missing |
No_of_distinct_values |
mean |
median |
var |
| 1 |
pwsid |
character |
6573 |
0 |
0.000 |
6573 |
NA |
NA |
NA |
| 2 |
TotPop |
numeric |
6573 |
0 |
0.000 |
6484 |
7517.38 |
116.76 |
2904290771.21 |
| 3 |
TotHH |
numeric |
6573 |
0 |
0.000 |
6476 |
2746.08 |
42.05 |
384678271.04 |
| 4 |
MedInc |
numeric |
6413 |
160 |
0.024 |
4510 |
60228.76 |
54254.64 |
656460098.54 |
| 5 |
HHBurden |
numeric |
6188 |
385 |
0.059 |
4373 |
19.53 |
19.93 |
33.72 |
| 6 |
PctMinority |
numeric |
6472 |
101 |
0.015 |
4605 |
0.38 |
0.35 |
0.05 |
| 7 |
MedYrBuilt |
numeric |
6345 |
228 |
0.035 |
2949 |
1978.29 |
1987.39 |
14138.93 |
| 8 |
PctNoPlumbing |
numeric |
6446 |
127 |
0.019 |
4039 |
0.03 |
0.02 |
0.00 |
Distance Variables
| Index |
Variable_Name |
Variable_Type |
Sample_n |
Missing_Count |
Per_of_Missing |
No_of_distinct_values |
mean |
median |
var |
| 1 |
pwsid |
character |
6573 |
0 |
0 |
6573 |
NA |
NA |
NA |
| 2 |
utilcount_20m |
integer |
6573 |
0 |
0 |
9 |
19.11 |
3.00 |
592.07 |
| 3 |
uphill_pct |
numeric |
6570 |
3 |
0 |
91 |
0.55 |
0.67 |
0.21 |
| 4 |
closest_util |
character |
6573 |
0 |
0 |
128 |
NA |
NA |
NA |