Data Preparation

df <- read.csv("https://raw.githubusercontent.com/engine2031/Data-606/main/Rooftop_Drinking_Water_Tank_Inspection_Results.csv")

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Is there a relationship with geographical location and failing inspections in New York City.

Cases

What are the cases, and how many are there? Each case represents and inspection report of a tank. There are 31,536 cases in this data set.

Data collection

Describe the method of data collection. The data is collected by the New York City Department of Health and Mental Hygiene. The inspections are self reported by building owners are collected by the city agency.

Type of study

What type of study is this (observational/experiment)? This is an observational study.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

https://data.cityofnewyork.us/Health/Rooftop-Drinking-Water-Tank-Inspection-Results/gjm4-k24g

Dependent Variable

What is the response variable? Is it quantitative or qualitative? The response variable is the the neighborhood tabulation area and is qualitative.

Independent Variable

You should have two independent variables, one quantitative and one qualitative. The independent variables inspections for sediments, biological growth, insects, rodent or birds, lab test for coliform and e coli.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

summary(df)
##       BIN              BOROUGH               ZIP         HOUSE_NUM        
##  Min.   :1.000e+06   Length:31536       Min.   :10001   Length:31536      
##  1st Qu.:1.020e+06   Class :character   1st Qu.:10016   Class :character  
##  Median :1.042e+06   Mode  :character   Median :10022   Mode  :character  
##  Mean   :1.190e+07                      Mean   :10168                     
##  3rd Qu.:1.083e+06                      3rd Qu.:10041                     
##  Max.   :4.102e+09                      Max.   :11694                     
##                                                                           
##  STREET_NAME            BLOCK            LOT       CONFIRMATION_NUM  
##  Length:31536       Min.   :    0   Min.   :   0   Length:31536      
##  Class :character   1st Qu.:  827   1st Qu.:  10   Class :character  
##  Mode  :character   Median : 1226   Median :  31   Mode  :character  
##                     Mean   : 1422   Mean   :1310                     
##                     3rd Qu.: 1484   3rd Qu.:  67                     
##                     Max.   :16177   Max.   :9100                     
##                                                                      
##  REPORTING_YEAR    TANK_NUM      INSPECTION_BY_FIRM INSPECTION_PERFORMED
##  Min.   :2014   Min.   : 1.000   Length:31536       Length:31536        
##  1st Qu.:2016   1st Qu.: 1.000   Class :character   Class :character    
##  Median :2018   Median : 1.000   Mode  :character   Mode  :character    
##  Mean   :2018   Mean   : 1.302                                          
##  3rd Qu.:2019   3rd Qu.: 1.000                                          
##  Max.   :2021   Max.   :13.000                                          
##                                                                         
##  INSPECTION_DATE    GI_REQ_INTERNAL_STRUCTURE GI_RESULT_INTERNAL_STRUCTURE
##  Length:31536       Length:31536              Length:31536                
##  Class :character   Class :character          Class :character            
##  Mode  :character   Mode  :character          Mode  :character            
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  GI_REQ_EXTERNAL_STRUCTURE GI_RESULT_EXTERNAL_STRUCTURE GI_REQ_OVERFLOW_PIPES
##  Length:31536              Length:31536                 Length:31536         
##  Class :character          Class :character             Class :character     
##  Mode  :character          Mode  :character             Mode  :character     
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  GI_RESULT_OVERFLOW_PIPES GI_REQ_ACCESS_LADDERS GI_RESULT_ACCESS_LADDERS
##  Length:31536             Length:31536          Length:31536            
##  Class :character         Class :character      Class :character        
##  Mode  :character         Mode  :character      Mode  :character        
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##  GI_REQ_AIR_VENTS   GI_RESULT_AIR_VENTS GI_REQ_ROOF_ACCESS
##  Length:31536       Length:31536        Length:31536      
##  Class :character   Class :character    Class :character  
##  Mode  :character   Mode  :character    Mode  :character  
##                                                           
##                                                           
##                                                           
##                                                           
##  GI_RESULT_ROOF_ACCESS SI_REQ_SEDIMENT    SI_RESULT_SEDIMENT
##  Length:31536          Length:31536       Length:31536      
##  Class :character      Class :character   Class :character  
##  Mode  :character      Mode  :character   Mode  :character  
##                                                             
##                                                             
##                                                             
##                                                             
##  SI_REQ_BIOLOGICAL_GROWTH SI_RESULT_BIOLOGICAL_GROWTH SI_REQ_DEBRIS_INSECTS
##  Length:31536             Length:31536                Length:31536         
##  Class :character         Class :character            Class :character     
##  Mode  :character         Mode  :character            Mode  :character     
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##  SI_RESULT_DEBRIS_INSECTS SI_REQ_RODENT_BIRD SI_RESULT_RODENT_BIRD
##  Length:31536             Length:31536       Length:31536         
##  Class :character         Class :character   Class :character     
##  Mode  :character         Mode  :character   Mode  :character     
##                                                                   
##                                                                   
##                                                                   
##                                                                   
##  SAMPLE_COLLECTED     LAB_NAME         NYS_CERTIFIED        ANALYTES        
##  Length:31536       Length:31536       Length:31536       Length:31536      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    COLIFORM            ECOLI           MEET_STANDARDS       DELETED         
##  Length:31536       Length:31536       Length:31536       Length:31536      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     LATITUDE       LONGITUDE      COMMUNITY_BOARD COUNCIL_DISTRICT
##  Min.   :40.51   Min.   :-74.24   Min.   : 1.00   Min.   : 1.000  
##  1st Qu.:40.74   1st Qu.:-73.99   1st Qu.: 5.00   1st Qu.: 3.000  
##  Median :40.76   Median :-73.98   Median : 6.00   Median : 4.000  
##  Mean   :40.76   Mean   :-73.97   Mean   : 5.79   Mean   : 7.162  
##  3rd Qu.:40.78   3rd Qu.:-73.96   3rd Qu.: 8.00   3rd Qu.: 6.000  
##  Max.   :40.91   Max.   :-73.71   Max.   :81.00   Max.   :51.000  
##  NA's   :63      NA's   :63       NA's   :63      NA's   :63      
##   CENSUS_TRACT         BBL                NTA             BATCH_DATE       
##  Min.   :     1   Min.   :1.000e+09   Length:31536       Length:31536      
##  1st Qu.:    80   1st Qu.:1.008e+09   Class :character   Class :character  
##  Median :   122   Median :1.013e+09   Mode  :character   Mode  :character  
##  Mean   :  3285   Mean   :1.295e+09                                        
##  3rd Qu.:   195   3rd Qu.:1.015e+09                                        
##  Max.   :155101   Max.   :5.080e+09                                        
##  NA's   :63       NA's   :80