df <- read.csv("https://raw.githubusercontent.com/engine2031/Data-606/main/Rooftop_Drinking_Water_Tank_Inspection_Results.csv")
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Is there a relationship with geographical location and failing inspections in New York City.
What are the cases, and how many are there? Each case represents and inspection report of a tank. There are 31,536 cases in this data set.
Describe the method of data collection. The data is collected by the New York City Department of Health and Mental Hygiene. The inspections are self reported by building owners are collected by the city agency.
What type of study is this (observational/experiment)? This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
https://data.cityofnewyork.us/Health/Rooftop-Drinking-Water-Tank-Inspection-Results/gjm4-k24g
What is the response variable? Is it quantitative or qualitative? The response variable is the the neighborhood tabulation area and is qualitative.
You should have two independent variables, one quantitative and one qualitative. The independent variables inspections for sediments, biological growth, insects, rodent or birds, lab test for coliform and e coli.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(df)
## BIN BOROUGH ZIP HOUSE_NUM
## Min. :1.000e+06 Length:31536 Min. :10001 Length:31536
## 1st Qu.:1.020e+06 Class :character 1st Qu.:10016 Class :character
## Median :1.042e+06 Mode :character Median :10022 Mode :character
## Mean :1.190e+07 Mean :10168
## 3rd Qu.:1.083e+06 3rd Qu.:10041
## Max. :4.102e+09 Max. :11694
##
## STREET_NAME BLOCK LOT CONFIRMATION_NUM
## Length:31536 Min. : 0 Min. : 0 Length:31536
## Class :character 1st Qu.: 827 1st Qu.: 10 Class :character
## Mode :character Median : 1226 Median : 31 Mode :character
## Mean : 1422 Mean :1310
## 3rd Qu.: 1484 3rd Qu.: 67
## Max. :16177 Max. :9100
##
## REPORTING_YEAR TANK_NUM INSPECTION_BY_FIRM INSPECTION_PERFORMED
## Min. :2014 Min. : 1.000 Length:31536 Length:31536
## 1st Qu.:2016 1st Qu.: 1.000 Class :character Class :character
## Median :2018 Median : 1.000 Mode :character Mode :character
## Mean :2018 Mean : 1.302
## 3rd Qu.:2019 3rd Qu.: 1.000
## Max. :2021 Max. :13.000
##
## INSPECTION_DATE GI_REQ_INTERNAL_STRUCTURE GI_RESULT_INTERNAL_STRUCTURE
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## GI_REQ_EXTERNAL_STRUCTURE GI_RESULT_EXTERNAL_STRUCTURE GI_REQ_OVERFLOW_PIPES
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## GI_RESULT_OVERFLOW_PIPES GI_REQ_ACCESS_LADDERS GI_RESULT_ACCESS_LADDERS
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## GI_REQ_AIR_VENTS GI_RESULT_AIR_VENTS GI_REQ_ROOF_ACCESS
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## GI_RESULT_ROOF_ACCESS SI_REQ_SEDIMENT SI_RESULT_SEDIMENT
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## SI_REQ_BIOLOGICAL_GROWTH SI_RESULT_BIOLOGICAL_GROWTH SI_REQ_DEBRIS_INSECTS
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## SI_RESULT_DEBRIS_INSECTS SI_REQ_RODENT_BIRD SI_RESULT_RODENT_BIRD
## Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## SAMPLE_COLLECTED LAB_NAME NYS_CERTIFIED ANALYTES
## Length:31536 Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## COLIFORM ECOLI MEET_STANDARDS DELETED
## Length:31536 Length:31536 Length:31536 Length:31536
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## LATITUDE LONGITUDE COMMUNITY_BOARD COUNCIL_DISTRICT
## Min. :40.51 Min. :-74.24 Min. : 1.00 Min. : 1.000
## 1st Qu.:40.74 1st Qu.:-73.99 1st Qu.: 5.00 1st Qu.: 3.000
## Median :40.76 Median :-73.98 Median : 6.00 Median : 4.000
## Mean :40.76 Mean :-73.97 Mean : 5.79 Mean : 7.162
## 3rd Qu.:40.78 3rd Qu.:-73.96 3rd Qu.: 8.00 3rd Qu.: 6.000
## Max. :40.91 Max. :-73.71 Max. :81.00 Max. :51.000
## NA's :63 NA's :63 NA's :63 NA's :63
## CENSUS_TRACT BBL NTA BATCH_DATE
## Min. : 1 Min. :1.000e+09 Length:31536 Length:31536
## 1st Qu.: 80 1st Qu.:1.008e+09 Class :character Class :character
## Median : 122 Median :1.013e+09 Mode :character Mode :character
## Mean : 3285 Mean :1.295e+09
## 3rd Qu.: 195 3rd Qu.:1.015e+09
## Max. :155101 Max. :5.080e+09
## NA's :63 NA's :80