SPS_Data607_FinalProject_DC

Author

David Chen

Introduction

In New York City’s vertical landscape, elevator reliability is not merely a convenience—it is a critical component of urban mobility, accessibility, and public safety.With over 70,000 elevators across the five boroughs, building owners and facility managers face a significant challenge in selecting maintenance providers that are both technically proficient and customer-oriented.

This project introduces a data-driven framework to generate a comprehensive Elevator Company Recommendation List.

The Multi-Source Approach

To provide a holistic view of contractor performance, the project integrates three distinct data streams:

NYC Department of Buildings (DoB) Open Data: We utilize the Elevator job applications 
submittion dataset.

Google Places API: This provides essential geographic and operational context, 
allowing us to map service providers.

Yelp Fusion API: Incorporating Yelp reviews allows us to add antoher weigh for rating.

Project Objective

The ultimate goal of this analysis is to produce a ranked directory of elevator maintenance companies. By quantifying factors such as mean time between violations, average customer rating, and geographic coverage, this tool empowers NYC stakeholders to make evidence-based decisions that enhance building safety and tenant satisfaction.

API key saved in .env file.

Structure .env File, the file follows the standard KEY=VALUE format without spaces around the equals sign:

#install.packages("dotenv")
library(dotenv)
load_dot_env()

Google API

The Google Places API is a web service provided through the Google Maps Platform that lets developers access detailed information about geographic locations and businesses around the world.

Here is my example of using Google API to get rating and other information.

#install.packages("googleway")
library(googleway)
load_dot_env()
g_api_key <- Sys.getenv("gAPIkey")
#print(g_api_key)

g_res <- google_places(search_string = "SCHINDLER ELEVATOR CORP     10036", 
                     key = g_api_key)

print(g_res$results[, c("name", "formatted_address", "rating", "place_id")])
                                               name
1 Schindler Elevators & Escalators in New York City
                                 formatted_address rating
1 1133 6th Ave 28th floor, New York, NY 10036, USA    3.4
                     place_id
1 ChIJK7Xpp09YwokR2rC-SNLPSV0
target_id <- g_res$results$place_id[1]

# 2. Get full details for that specific business
g_details <- google_place_details(place_id = target_id, key = g_api_key)

# Access specific info
business_phone <- g_details$result$formatted_phone_number
business_site  <- g_details$result$website
business_hours <- g_details$result$opening_hours$weekday_text

print(business_hours)
[1] "Monday: 8:00 AM – 5:00 PM"    "Tuesday: 8:00 AM – 5:00 PM"  
[3] "Wednesday: 8:00 AM – 5:00 PM" "Thursday: 8:00 AM – 5:00 PM" 
[5] "Friday: 8:00 AM – 5:00 PM"    "Saturday: Closed"            
[7] "Sunday: Closed"              

Yelp API

The Yelp Fusion API is the official developer API provided by Yelp, designed to give applications access to Yelp’s large database of local business information, reviews, and ratings.

Unlike mapping-focused APIs, Yelp’s API is centered more on consumer review and business reputation data, making it especially useful for recommendation and discovery apps.

Here is my Example of using Yelp API to get rating.

#install.packages("yelpr")
#library(yelpr)
library(httr)
library(jsonlite)
load_dot_env()
y_api_key <- Sys.getenv("yelpkey")
y_url <- "https://api.yelp.com/v3/businesses/search"

# Define parameters
params <- list(
  term = "SCHINDLER ELEVATOR CORP   ",
  location = "10036"
)

# Make the GET request
y_response <- GET(
  url = y_url,
  add_headers(Authorization = paste("Bearer", y_api_key)),
  query = params
)

# Parse the JSON response
y_data <- fromJSON(content(y_response, as = "text", encoding = "UTF-8"))

# Access the rating
y_rating <- y_data$businesses$rating[1]
print(y_rating)
[1] 4.9

Load data from Opendata NYC

The DOB NOW Elevator permit dataset was subset to include only the Applicant.License.Number”, “Applicant.Business.Name”, “Applicant.Address”, “Applicant.First.Name”, “Applicant.Last.Name”, “Applicant.City”, “Applicant.Zip, and ‘Applicant License Number’ columns.

This focused selection significantly reduces the dataframe’s memory footprint.

tmp_df_elevator <- read.csv("DOB_NOW__Build_Elevator_Permit.csv")
colnames(tmp_df_elevator)
 [1] "Job.Filling.Number"                 "Job.Number"                        
 [3] "Filing.Number"                      "Filing.Date"                       
 [5] "Filing.Type"                        "Elevator.Device.Type"              
 [7] "Filing.Status"                      "Filing.Status.or.Filing.Includes"  
 [9] "Building.Code"                      "Electrical.Permit.Number"          
[11] "BIN"                                "House.Number"                      
[13] "Street.Name"                        "ZIP"                               
[15] "Borough"                            "Block"                             
[17] "Lot"                                "BuildingType"                      
[19] "Building.Stories"                   "Is.In.Conjunction"                 
[21] "Associated.BIS.Job.Number"          "Total.Construction.Floor.Area"     
[23] "Plan.Examiner.Assigned.Date"        "Incomplete.Date"                   
[25] "Last.Incomplete.Submission.Date"    "X1st.Objection.Date"               
[27] "Last.Objection.Date"                "Resubmission.Date"                 
[29] "Permit.Entire.Date"                 "Signed.Off.Date"                   
[31] "Applicant.First.Name"               "Applicant.Last.Name"               
[33] "Applicant.Business.Name"            "Applicant.Address"                 
[35] "Applicant.City"                     "Applicant.State"                   
[37] "Applicant.Zip"                      "Applicant.License.Number"          
[39] "Design.Professional.First.Name"     "Design.Professional.Last.Name"     
[41] "Design.Professional.Business.Name"  "Design.Professional.Address"       
[43] "Design.Professional.City"           "Design.Professional.State"         
[45] "Design.Professional..Zip"           "Design.Professional.License.Number"
[47] "Owner.First.Name"                   "Owner.Last.Name"                   
[49] "Owner.Title"                        "Owner.Business.Name"               
[51] "Owner.Address"                      "Owner.City"                        
[53] "Owner.State"                        "Owner.Zip"                         
[55] "Owner.Type"                         "Asbestos.Abatement.Complaince"     
[57] "DEP.ACP.5.Control.No"               "Scope.of.Work"                     
[59] "General.Liability"                  "GL.Certificate.Policy.No"          
[61] "GL.Expiration.Date"                 "Workers.Compensation"              
[63] "WC.Certificate.Policy.No"           "WC.Expiration.Date"                
[65] "Disability"                         "Disability.Certificate.Policy.No"  
[67] "Disability.Expiration.Date"         "Estimated.Cost"                    
[69] "Filing.Fee"                         "No.Good.Check"                     
[71] "Total.Fee"                          "Amount.Paid"                       
[73] "Amount.Due"                         "Review.Type..PPN."                 
[75] "Permit.Expiration.Date"             "LATITUDE"                          
[77] "LONGITUDE"                          "COMMUNITY_DISTRICT_NUMBER"         
[79] "CITY_COUNCIL_DISTRICT"              "CENSUS_TRACT"                      
[81] "BBL"                                "NTA_NAME"                          

Subset the dataframe

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)
df_elevator <- tmp_df_elevator[c("Applicant.License.Number",
                                 "Applicant.Business.Name",
                                 "Applicant.Address",
                                 "Applicant.First.Name",
                                 "Applicant.Last.Name",
                                 "Applicant.City",
                                 "Applicant.Zip")]
head(df_elevator)
  Applicant.License.Number   Applicant.Business.Name       Applicant.Address
1               L - 445001     ROTAVELE ELEVATOR INC       414 SENECA AVENUE
2               L - 074117   SCHINDLER ELEVATOR CORP                        
3               L - 159169 NEW YORK CITY HOUSING AUT          23-02 49TH AVE
4               L - 470001 ALLIANCE ELEV COMPANY LLC                        
5               L - 556001   MILLENNIUM ELEV ENT LLC 2618 AVENUE Z 1ST FLOOR
6               L - 564001        START ELEVATOR LLC                        
  Applicant.First.Name Applicant.Last.Name   Applicant.City Applicant.Zip
1                FRANK          TORTORELLA        RIDGEWOOD         11385
2              WILLIAM             MENSCHE         NEW YORK         10036
3                ALVIN              DUNCAN          BALDWIN         11510
4               DONALD        BASEDOW  JR. LONG ISLAND CITY         11106
5                BORIS             EVELKIN         BROOKLYN         11235
6                KEVIN                SHEA            BRONX         10466
df_ele_business <- df_elevator %>%
  select(Applicant.Business.Name,Applicant.Address,Applicant.Zip)%>%
  distinct(Applicant.Business.Name, .keep_all = TRUE)
head(df_ele_business)
    Applicant.Business.Name       Applicant.Address Applicant.Zip
1     ROTAVELE ELEVATOR INC       414 SENECA AVENUE         11385
2   SCHINDLER ELEVATOR CORP                                 10036
3 NEW YORK CITY HOUSING AUT          23-02 49TH AVE         11510
4 ALLIANCE ELEV COMPANY LLC                                 11106
5   MILLENNIUM ELEV ENT LLC 2618 AVENUE Z 1ST FLOOR         11235
6        START ELEVATOR LLC                                 10466
dim(df_ele_business)
[1] 320   3
df_ele_business <- df_ele_business%>%
  unite("Business_info", 1:3, sep = " ", remove = FALSE)
dim(df_ele_business)
[1] 320   4

Add a column that combines the business name, address, and ZIP code to use as the search input.

Looping the Google API.

Iterate through the API calls to fetch complete business information, including longitude, latitude, and rating.

Here , I pre-run the loop and saved the data into the csv file.

df_new_res <- read.csv( "elevator_business_list.csv")
head(df_new_res)
              business_name business_phone business_lat business_lng
1     ROTAVELE ELEVATOR INC (718) 386-3000     40.70631    -73.91397
2   SCHINDLER ELEVATOR CORP (212) 708-1000     40.75608    -73.98367
3 NEW YORK CITY HOUSING AUT (718) 707-7771     40.74185    -73.94644
4 ALLIANCE ELEV COMPANY LLC           <NA>           NA           NA
5   MILLENNIUM ELEV ENT LLC (718) 615-0406     40.58903    -73.94286
6        START ELEVATOR LLC (718) 324-9166     40.89859    -73.85939
  business_rating
1             3.9
2             3.4
3             3.8
4              NA
5             3.7
6             3.6

This is the actual loop that calls the API, processes the response, and saves it to a file.

library(jsonlite)
#rm(flat_res)
df_new_res <- data.frame(
  business_name = character(),
  business_phone= character(),
  business_lat= double(),
  business_lng= double(),
  business_rating=double()
)
for (i in seq_len(nrow(df_ele_business))) {
#for (i in 1:5) {  
  # Perform your logic here
  search_str_input <- df_ele_business[i, 1]
  
  g_res <- google_places(search_string = search_str_input, 
                     key = g_api_key)

  #print(g_res$results[, c("name", "formatted_address", "rating", "place_id")])


  target_id <- g_res$results$place_id[1]

# 2. Get full details for that specific business
  g_details <- google_place_details(place_id = target_id, key = g_api_key)

# Access specific info
  bus_phone <- g_details$result$formatted_phone_number
  bus_hours <- g_details$result$opening_hours$weekday_text
  bus_lat <- g_details$result$geometry$location$lat
  bus_lng <- g_details$result$geometry$location$lng
  bus_rating <- g_details$result$rating
  
  add_bus_info <- tibble(
    business_name = df_ele_business[i, 2],
    business_phone= bus_phone,
    business_lat=bus_lat,
    business_lng= bus_lng,
    business_rating= bus_rating
    
  )
  df_new_res<- bind_rows(df_new_res,add_bus_info)
  
  
  #df_new_res[i,1:5] <- c(,business_phone,business_lat,business_lng,business_rating)
  
  
  
  
  # flat_res <- flatten(as.data.frame(g_details$result)) 
  #  in flatten(g_details$result) : is.data.frame(x) is not TRUE
  # flat_res <- flat_res %>%
  #   mutate(
  #     business_name = df_ele_business[i, 2],
  #   business_address=df_ele_business[i, 3],
  #   business_zipcode=df_ele_business[i, 4]
  #   )
  # 
  # df_new_res <- df_new_res %>%
  #   right_join(flat_res,by="business_name")
  Sys.sleep(3)
  
}
write.csv(df_new_res, "elevator_business_list.csv", row.names = FALSE)

Yelp API loop.

The process will follow similar steps to the Google API for retrieving ratings from Yelp.

library(httr)
library(jsonlite)
load_dot_env()
y_api_key <- Sys.getenv("yelpkey")
y_url <- "https://api.yelp.com/v3/businesses/search"


df_yelp_res <- data.frame(
  business_name = character(),
  business_yelp_rating=double()
)

for (i in seq_len(nrow(df_ele_business))) {
  
    # Perform your logic here
    search_yelp_input <- df_ele_business[i, 2]
    search_yelp_zip_input <- df_ele_business[i, 4]
  # Define parameters
  params <- list(
    term = search_yelp_input,
    location = search_yelp_zip_input
  )
  
  # Make the GET request
  y_response <- GET(
    url = y_url,
    add_headers(Authorization = paste("Bearer", y_api_key)),
    query = params
  )
  
  # Parse the JSON response
  y_data <- fromJSON(content(y_response, as = "text", encoding = "UTF-8"))
  
  # Access the rating
  y_rating <- y_data$businesses$rating[1]
  print(y_rating)
  
  add_yelp_bus_info <- tibble(
    business_name = df_ele_business[i, 2],
    business_yelp_rating= y_rating
    
  )
  df_yelp_res<- bind_rows(df_yelp_res,add_yelp_bus_info)
}

head(df_yelp_res)
write.csv(df_yelp_res, "elevator_business_list_yelp.csv", row.names = FALSE)

Combine Goolge rating and Yelp rating

Now we have two sets of rating and location information. We will combine both ratings into a single data frame for further data processing.

df_yelp_res <- read.csv("elevator_business_list_yelp.csv")
head(df_yelp_res)
              business_name business_yelp_rating
1     ROTAVELE ELEVATOR INC                  5.0
2   SCHINDLER ELEVATOR CORP                  4.8
3 NEW YORK CITY HOUSING AUT                  4.8
4 ALLIANCE ELEV COMPANY LLC                  5.0
5   MILLENNIUM ELEV ENT LLC                  4.3
6        START ELEVATOR LLC                  5.0
df_ele_rating <- right_join(df_new_res,df_yelp_res,by="business_name")
df_ele_rating <- df_ele_rating %>%
  select(-(2:4))
head(df_ele_rating)
              business_name business_rating business_yelp_rating
1     ROTAVELE ELEVATOR INC             3.9                  5.0
2   SCHINDLER ELEVATOR CORP             3.4                  4.8
3 NEW YORK CITY HOUSING AUT             3.8                  4.8
4 ALLIANCE ELEV COMPANY LLC              NA                  5.0
5   MILLENNIUM ELEV ENT LLC             3.7                  4.3
6        START ELEVATOR LLC             3.6                  5.0

Prepare for recommendation

recommendation_list <- df_ele_rating %>%
  mutate(
    # Replace NA with 0 or the other rating so the calculation doesn't fail
    adj_rating = ifelse(is.na(business_rating), business_yelp_rating, business_rating),
    adj_yelp = ifelse(is.na(business_yelp_rating), business_rating, business_yelp_rating),
    
    final_score = (adj_rating * 0.7) + (adj_yelp * 0.3)
  ) 
head(recommendation_list)
              business_name business_rating business_yelp_rating adj_rating
1     ROTAVELE ELEVATOR INC             3.9                  5.0        3.9
2   SCHINDLER ELEVATOR CORP             3.4                  4.8        3.4
3 NEW YORK CITY HOUSING AUT             3.8                  4.8        3.8
4 ALLIANCE ELEV COMPANY LLC              NA                  5.0        5.0
5   MILLENNIUM ELEV ENT LLC             3.7                  4.3        3.7
6        START ELEVATOR LLC             3.6                  5.0        3.6
  adj_yelp final_score
1      5.0        4.23
2      4.8        3.82
3      4.8        4.10
4      5.0        5.00
5      4.3        3.88
6      5.0        4.02
library(dplyr)
library(ggplot2)

df_summary <- recommendation_list %>%
  group_by(final_score) %>%
  summarise(count = n()) %>%
  ungroup()

ggplot(df_summary, aes(x = final_score, y = count, fill = final_score)) +
  geom_col(width = 0.1) +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(
    x = "Final Score",
    y = "Number of Businesses",
    title = "Distribution of Final Scores (Exact Values)"
  ) +
  theme_minimal()+
  theme(legend.title = element_blank())
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).

Create a geo map based on the business location.

It appears that some businesses have closed, so the NA values will be removed from the dataset.

# Install the required packages if you don't have them already:
#install.packages(c("tidygeocoder", "leaflet"))

library(tidygeocoder)
library(leaflet)
library(dplyr)


# # 1. Create a dataframe with some sample New York City addresses
# nyc_locations <- data.frame(
#   name = c("Empire State Building", "Barclays Center", "Central Park Zoo", "Apollo Theater"),
#   address = c(
#     "20 W 34th St., New York, NY 10001",
#     "620 Atlantic Ave, Brooklyn, NY 11217",
#     "E 64th St, New York, NY 10021",
#     "253 W 125th St, New York, NY 10027"
#   )
# )
# 
# # 2. Geocode the addresses to get Latitude and Longitude
# # We use 'osm' (OpenStreetMap) which is free and doesn't require an API key.
# cat("Geocoding NYC addresses (this may take a few seconds)...\n")
# nyc_coordinates <- nyc_locations %>%
#   geocode(address, method = 'osm', lat = latitude, long = longitude)
# 
# # Print the resulting coordinates to the console
# print(nyc_coordinates)

# 3. Plot the geocoded coordinates on an interactive NYC map using leaflet
nyc_map <- leaflet(na.omit(df_new_res)) %>%
  # Use CartoDB's Positron theme for a clean, light-colored base map
  addProviderTiles(providers$CartoDB.Positron) %>%
  
  # Center the map roughly over New York City (Zoom level 11 is good for cities)
  setView(lng = -73.98, lat = 40.73, zoom = 11) %>%
  
  # Add pins to the map for each location
  addMarkers(
    lng = ~business_lng,
    lat = ~business_lat,
    # Create a popup label that appears when you click the pin
    popup = ~paste0("<b>", business_name, "</b><br/>", business_phone)
  )

# Display the map in the RStudio Viewer
nyc_map

Based on the open data, we only have 320 elevator businesses that have submitted job applications to the DOB.

#remotes::install_github("mfherman/nycgeo")
library(nycgeo)
To work with the spatial data included in this package, you should also load the {sf} package with library(sf).
df_ele_business<-df_ele_business %>% rename(business_name = Applicant.Business.Name)
df_g_rating_zip <- df_new_res %>%
  right_join(
    df_ele_business %>% 
      select(business_name,Applicant.Zip),by = "business_name"
  )
head(df_g_rating_zip)
              business_name business_phone business_lat business_lng
1     ROTAVELE ELEVATOR INC (718) 386-3000     40.70631    -73.91397
2   SCHINDLER ELEVATOR CORP (212) 708-1000     40.75608    -73.98367
3 NEW YORK CITY HOUSING AUT (718) 707-7771     40.74185    -73.94644
4 ALLIANCE ELEV COMPANY LLC           <NA>           NA           NA
5   MILLENNIUM ELEV ENT LLC (718) 615-0406     40.58903    -73.94286
6        START ELEVATOR LLC (718) 324-9166     40.89859    -73.85939
  business_rating Applicant.Zip
1             3.9         11385
2             3.4         10036
3             3.8         11510
4              NA         11106
5             3.7         11235
6             3.6         10466
recommendation_list %>%
  slice_max(final_score, n = 5)%>%
  slice_head(n = 10)
                  business_name business_rating business_yelp_rating adj_rating
1     ALLIANCE ELEV COMPANY LLC              NA                    5          5
2      TRANSEL ELEV & ELECT,INC              NA                    5          5
3     L.C.D. ELEVATOR REPAIR IN              NA                    5          5
4    CHANG PARK ARCHITECT, P.C.               5                    5          5
5     PATUXENT ENGINEERING, LLC               5                    5          5
6         EAGLE ELEVATOR CO INC               5                    5          5
7     CONCORD ELEVATOR IND, INC              NA                    5          5
8            P&W ELEVATORS, INC               5                    5          5
9           KN3 ENGINEERING LLC              NA                    5          5
10 PAUL PERDEK, PROF. ENG. PLLC              NA                    5          5
   adj_yelp final_score
1         5           5
2         5           5
3         5           5
4         5           5
5         5           5
6         5           5
7         5           5
8         5           5
9         5           5
10        5           5
df_5_elev <- recommendation_list %>%
  slice_max(final_score, n = 5) %>%
  left_join(df_new_res,by="business_name")%>%
  drop_na()

top_nyc_map <- leaflet(na.omit(df_5_elev)) %>%
  # Use CartoDB's Positron theme for a clean, light-colored base map
  addProviderTiles(providers$CartoDB.Positron) %>%

  # Center the map roughly over New York City (Zoom level 11 is good for cities)
  setView(lng = -73.98, lat = 40.73, zoom = 11) %>%

  # Add pins to the map for each location
  addMarkers(
    lng = ~business_lng,
    lat = ~business_lat,
    # Create a popup label that appears when you click the pin
    popup = ~paste0("<b>", business_name, "</b><br/>", business_phone)
  )

# Display the map in the RStudio Viewer
top_nyc_map

We can now observe that there are only a small number of 5-star elevator companies across NYC, with most of them located in Lower Manhattan.

Conclusion

Due to time and data limitations, the recommendation list may not accurately capture the spatial relationship between service companies and buildings. Furthermore, using only two rating sources may not be sufficient to fully ensure that the listed businesses meet the intended requirements.

Project Reflection

However, the project also included several days of structured review and practice sessions. A key takeaway from this work is that LLMs have inherent limitations; for instance, they may reference non-existent libraries or produce inaccurate statements, which necessitated additional time for verification and correction of the implemented code. Throughout this final project, I gained substantial learning experience through the iterative process of refining API calls, consulting LLM-generated outputs, and independently validating their accuracy. This back-and-forth workflow underscored the importance of critically evaluating LLM responses rather than accepting them at face value, particularly in the context of software development and data processing tasks.