#install.packages("dotenv")
library(dotenv)
load_dot_env()SPS_Data607_FinalProject_DC
Introduction
In New York City’s vertical landscape, elevator reliability is not merely a convenience—it is a critical component of urban mobility, accessibility, and public safety.With over 70,000 elevators across the five boroughs, building owners and facility managers face a significant challenge in selecting maintenance providers that are both technically proficient and customer-oriented.
This project introduces a data-driven framework to generate a comprehensive Elevator Company Recommendation List.
The Multi-Source Approach
To provide a holistic view of contractor performance, the project integrates three distinct data streams:
NYC Department of Buildings (DoB) Open Data: We utilize the Elevator job applications
submittion dataset.
Google Places API: This provides essential geographic and operational context,
allowing us to map service providers.
Yelp Fusion API: Incorporating Yelp reviews allows us to add antoher weigh for rating.
Project Objective
The ultimate goal of this analysis is to produce a ranked directory of elevator maintenance companies. By quantifying factors such as mean time between violations, average customer rating, and geographic coverage, this tool empowers NYC stakeholders to make evidence-based decisions that enhance building safety and tenant satisfaction.
API key saved in .env file.
Structure .env File, the file follows the standard KEY=VALUE format without spaces around the equals sign:
Google API
The Google Places API is a web service provided through the Google Maps Platform that lets developers access detailed information about geographic locations and businesses around the world.
Here is my example of using Google API to get rating and other information.
#install.packages("googleway")
library(googleway)
load_dot_env()
g_api_key <- Sys.getenv("gAPIkey")
#print(g_api_key)
g_res <- google_places(search_string = "SCHINDLER ELEVATOR CORP 10036",
key = g_api_key)
print(g_res$results[, c("name", "formatted_address", "rating", "place_id")]) name
1 Schindler Elevators & Escalators in New York City
formatted_address rating
1 1133 6th Ave 28th floor, New York, NY 10036, USA 3.4
place_id
1 ChIJK7Xpp09YwokR2rC-SNLPSV0
target_id <- g_res$results$place_id[1]
# 2. Get full details for that specific business
g_details <- google_place_details(place_id = target_id, key = g_api_key)
# Access specific info
business_phone <- g_details$result$formatted_phone_number
business_site <- g_details$result$website
business_hours <- g_details$result$opening_hours$weekday_text
print(business_hours)[1] "Monday: 8:00 AM – 5:00 PM" "Tuesday: 8:00 AM – 5:00 PM"
[3] "Wednesday: 8:00 AM – 5:00 PM" "Thursday: 8:00 AM – 5:00 PM"
[5] "Friday: 8:00 AM – 5:00 PM" "Saturday: Closed"
[7] "Sunday: Closed"
Yelp API
The Yelp Fusion API is the official developer API provided by Yelp, designed to give applications access to Yelp’s large database of local business information, reviews, and ratings.
Unlike mapping-focused APIs, Yelp’s API is centered more on consumer review and business reputation data, making it especially useful for recommendation and discovery apps.
Here is my Example of using Yelp API to get rating.
#install.packages("yelpr")
#library(yelpr)library(httr)
library(jsonlite)
load_dot_env()
y_api_key <- Sys.getenv("yelpkey")
y_url <- "https://api.yelp.com/v3/businesses/search"
# Define parameters
params <- list(
term = "SCHINDLER ELEVATOR CORP ",
location = "10036"
)
# Make the GET request
y_response <- GET(
url = y_url,
add_headers(Authorization = paste("Bearer", y_api_key)),
query = params
)
# Parse the JSON response
y_data <- fromJSON(content(y_response, as = "text", encoding = "UTF-8"))
# Access the rating
y_rating <- y_data$businesses$rating[1]
print(y_rating)[1] 4.9
Load data from Opendata NYC
The DOB NOW Elevator permit dataset was subset to include only the Applicant.License.Number”, “Applicant.Business.Name”, “Applicant.Address”, “Applicant.First.Name”, “Applicant.Last.Name”, “Applicant.City”, “Applicant.Zip, and ‘Applicant License Number’ columns.
This focused selection significantly reduces the dataframe’s memory footprint.
tmp_df_elevator <- read.csv("DOB_NOW__Build_Elevator_Permit.csv")
colnames(tmp_df_elevator) [1] "Job.Filling.Number" "Job.Number"
[3] "Filing.Number" "Filing.Date"
[5] "Filing.Type" "Elevator.Device.Type"
[7] "Filing.Status" "Filing.Status.or.Filing.Includes"
[9] "Building.Code" "Electrical.Permit.Number"
[11] "BIN" "House.Number"
[13] "Street.Name" "ZIP"
[15] "Borough" "Block"
[17] "Lot" "BuildingType"
[19] "Building.Stories" "Is.In.Conjunction"
[21] "Associated.BIS.Job.Number" "Total.Construction.Floor.Area"
[23] "Plan.Examiner.Assigned.Date" "Incomplete.Date"
[25] "Last.Incomplete.Submission.Date" "X1st.Objection.Date"
[27] "Last.Objection.Date" "Resubmission.Date"
[29] "Permit.Entire.Date" "Signed.Off.Date"
[31] "Applicant.First.Name" "Applicant.Last.Name"
[33] "Applicant.Business.Name" "Applicant.Address"
[35] "Applicant.City" "Applicant.State"
[37] "Applicant.Zip" "Applicant.License.Number"
[39] "Design.Professional.First.Name" "Design.Professional.Last.Name"
[41] "Design.Professional.Business.Name" "Design.Professional.Address"
[43] "Design.Professional.City" "Design.Professional.State"
[45] "Design.Professional..Zip" "Design.Professional.License.Number"
[47] "Owner.First.Name" "Owner.Last.Name"
[49] "Owner.Title" "Owner.Business.Name"
[51] "Owner.Address" "Owner.City"
[53] "Owner.State" "Owner.Zip"
[55] "Owner.Type" "Asbestos.Abatement.Complaince"
[57] "DEP.ACP.5.Control.No" "Scope.of.Work"
[59] "General.Liability" "GL.Certificate.Policy.No"
[61] "GL.Expiration.Date" "Workers.Compensation"
[63] "WC.Certificate.Policy.No" "WC.Expiration.Date"
[65] "Disability" "Disability.Certificate.Policy.No"
[67] "Disability.Expiration.Date" "Estimated.Cost"
[69] "Filing.Fee" "No.Good.Check"
[71] "Total.Fee" "Amount.Paid"
[73] "Amount.Due" "Review.Type..PPN."
[75] "Permit.Expiration.Date" "LATITUDE"
[77] "LONGITUDE" "COMMUNITY_DISTRICT_NUMBER"
[79] "CITY_COUNCIL_DISTRICT" "CENSUS_TRACT"
[81] "BBL" "NTA_NAME"
Subset the dataframe
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)
df_elevator <- tmp_df_elevator[c("Applicant.License.Number",
"Applicant.Business.Name",
"Applicant.Address",
"Applicant.First.Name",
"Applicant.Last.Name",
"Applicant.City",
"Applicant.Zip")]
head(df_elevator) Applicant.License.Number Applicant.Business.Name Applicant.Address
1 L - 445001 ROTAVELE ELEVATOR INC 414 SENECA AVENUE
2 L - 074117 SCHINDLER ELEVATOR CORP
3 L - 159169 NEW YORK CITY HOUSING AUT 23-02 49TH AVE
4 L - 470001 ALLIANCE ELEV COMPANY LLC
5 L - 556001 MILLENNIUM ELEV ENT LLC 2618 AVENUE Z 1ST FLOOR
6 L - 564001 START ELEVATOR LLC
Applicant.First.Name Applicant.Last.Name Applicant.City Applicant.Zip
1 FRANK TORTORELLA RIDGEWOOD 11385
2 WILLIAM MENSCHE NEW YORK 10036
3 ALVIN DUNCAN BALDWIN 11510
4 DONALD BASEDOW JR. LONG ISLAND CITY 11106
5 BORIS EVELKIN BROOKLYN 11235
6 KEVIN SHEA BRONX 10466
df_ele_business <- df_elevator %>%
select(Applicant.Business.Name,Applicant.Address,Applicant.Zip)%>%
distinct(Applicant.Business.Name, .keep_all = TRUE)
head(df_ele_business) Applicant.Business.Name Applicant.Address Applicant.Zip
1 ROTAVELE ELEVATOR INC 414 SENECA AVENUE 11385
2 SCHINDLER ELEVATOR CORP 10036
3 NEW YORK CITY HOUSING AUT 23-02 49TH AVE 11510
4 ALLIANCE ELEV COMPANY LLC 11106
5 MILLENNIUM ELEV ENT LLC 2618 AVENUE Z 1ST FLOOR 11235
6 START ELEVATOR LLC 10466
dim(df_ele_business)[1] 320 3
df_ele_business <- df_ele_business%>%
unite("Business_info", 1:3, sep = " ", remove = FALSE)
dim(df_ele_business)[1] 320 4
Add a column that combines the business name, address, and ZIP code to use as the search input.
Looping the Google API.
Iterate through the API calls to fetch complete business information, including longitude, latitude, and rating.
Here , I pre-run the loop and saved the data into the csv file.
df_new_res <- read.csv( "elevator_business_list.csv")
head(df_new_res) business_name business_phone business_lat business_lng
1 ROTAVELE ELEVATOR INC (718) 386-3000 40.70631 -73.91397
2 SCHINDLER ELEVATOR CORP (212) 708-1000 40.75608 -73.98367
3 NEW YORK CITY HOUSING AUT (718) 707-7771 40.74185 -73.94644
4 ALLIANCE ELEV COMPANY LLC <NA> NA NA
5 MILLENNIUM ELEV ENT LLC (718) 615-0406 40.58903 -73.94286
6 START ELEVATOR LLC (718) 324-9166 40.89859 -73.85939
business_rating
1 3.9
2 3.4
3 3.8
4 NA
5 3.7
6 3.6
This is the actual loop that calls the API, processes the response, and saves it to a file.
library(jsonlite)
#rm(flat_res)
df_new_res <- data.frame(
business_name = character(),
business_phone= character(),
business_lat= double(),
business_lng= double(),
business_rating=double()
)
for (i in seq_len(nrow(df_ele_business))) {
#for (i in 1:5) {
# Perform your logic here
search_str_input <- df_ele_business[i, 1]
g_res <- google_places(search_string = search_str_input,
key = g_api_key)
#print(g_res$results[, c("name", "formatted_address", "rating", "place_id")])
target_id <- g_res$results$place_id[1]
# 2. Get full details for that specific business
g_details <- google_place_details(place_id = target_id, key = g_api_key)
# Access specific info
bus_phone <- g_details$result$formatted_phone_number
bus_hours <- g_details$result$opening_hours$weekday_text
bus_lat <- g_details$result$geometry$location$lat
bus_lng <- g_details$result$geometry$location$lng
bus_rating <- g_details$result$rating
add_bus_info <- tibble(
business_name = df_ele_business[i, 2],
business_phone= bus_phone,
business_lat=bus_lat,
business_lng= bus_lng,
business_rating= bus_rating
)
df_new_res<- bind_rows(df_new_res,add_bus_info)
#df_new_res[i,1:5] <- c(,business_phone,business_lat,business_lng,business_rating)
# flat_res <- flatten(as.data.frame(g_details$result))
# in flatten(g_details$result) : is.data.frame(x) is not TRUE
# flat_res <- flat_res %>%
# mutate(
# business_name = df_ele_business[i, 2],
# business_address=df_ele_business[i, 3],
# business_zipcode=df_ele_business[i, 4]
# )
#
# df_new_res <- df_new_res %>%
# right_join(flat_res,by="business_name")
Sys.sleep(3)
}
write.csv(df_new_res, "elevator_business_list.csv", row.names = FALSE)Yelp API loop.
The process will follow similar steps to the Google API for retrieving ratings from Yelp.
library(httr)
library(jsonlite)
load_dot_env()
y_api_key <- Sys.getenv("yelpkey")
y_url <- "https://api.yelp.com/v3/businesses/search"
df_yelp_res <- data.frame(
business_name = character(),
business_yelp_rating=double()
)
for (i in seq_len(nrow(df_ele_business))) {
# Perform your logic here
search_yelp_input <- df_ele_business[i, 2]
search_yelp_zip_input <- df_ele_business[i, 4]
# Define parameters
params <- list(
term = search_yelp_input,
location = search_yelp_zip_input
)
# Make the GET request
y_response <- GET(
url = y_url,
add_headers(Authorization = paste("Bearer", y_api_key)),
query = params
)
# Parse the JSON response
y_data <- fromJSON(content(y_response, as = "text", encoding = "UTF-8"))
# Access the rating
y_rating <- y_data$businesses$rating[1]
print(y_rating)
add_yelp_bus_info <- tibble(
business_name = df_ele_business[i, 2],
business_yelp_rating= y_rating
)
df_yelp_res<- bind_rows(df_yelp_res,add_yelp_bus_info)
}
head(df_yelp_res)
write.csv(df_yelp_res, "elevator_business_list_yelp.csv", row.names = FALSE)Combine Goolge rating and Yelp rating
Now we have two sets of rating and location information. We will combine both ratings into a single data frame for further data processing.
df_yelp_res <- read.csv("elevator_business_list_yelp.csv")
head(df_yelp_res) business_name business_yelp_rating
1 ROTAVELE ELEVATOR INC 5.0
2 SCHINDLER ELEVATOR CORP 4.8
3 NEW YORK CITY HOUSING AUT 4.8
4 ALLIANCE ELEV COMPANY LLC 5.0
5 MILLENNIUM ELEV ENT LLC 4.3
6 START ELEVATOR LLC 5.0
df_ele_rating <- right_join(df_new_res,df_yelp_res,by="business_name")
df_ele_rating <- df_ele_rating %>%
select(-(2:4))
head(df_ele_rating) business_name business_rating business_yelp_rating
1 ROTAVELE ELEVATOR INC 3.9 5.0
2 SCHINDLER ELEVATOR CORP 3.4 4.8
3 NEW YORK CITY HOUSING AUT 3.8 4.8
4 ALLIANCE ELEV COMPANY LLC NA 5.0
5 MILLENNIUM ELEV ENT LLC 3.7 4.3
6 START ELEVATOR LLC 3.6 5.0
Prepare for recommendation
recommendation_list <- df_ele_rating %>%
mutate(
# Replace NA with 0 or the other rating so the calculation doesn't fail
adj_rating = ifelse(is.na(business_rating), business_yelp_rating, business_rating),
adj_yelp = ifelse(is.na(business_yelp_rating), business_rating, business_yelp_rating),
final_score = (adj_rating * 0.7) + (adj_yelp * 0.3)
)
head(recommendation_list) business_name business_rating business_yelp_rating adj_rating
1 ROTAVELE ELEVATOR INC 3.9 5.0 3.9
2 SCHINDLER ELEVATOR CORP 3.4 4.8 3.4
3 NEW YORK CITY HOUSING AUT 3.8 4.8 3.8
4 ALLIANCE ELEV COMPANY LLC NA 5.0 5.0
5 MILLENNIUM ELEV ENT LLC 3.7 4.3 3.7
6 START ELEVATOR LLC 3.6 5.0 3.6
adj_yelp final_score
1 5.0 4.23
2 4.8 3.82
3 4.8 4.10
4 5.0 5.00
5 4.3 3.88
6 5.0 4.02
library(dplyr)
library(ggplot2)
df_summary <- recommendation_list %>%
group_by(final_score) %>%
summarise(count = n()) %>%
ungroup()
ggplot(df_summary, aes(x = final_score, y = count, fill = final_score)) +
geom_col(width = 0.1) +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(
x = "Final Score",
y = "Number of Businesses",
title = "Distribution of Final Scores (Exact Values)"
) +
theme_minimal()+
theme(legend.title = element_blank())Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
Create a geo map based on the business location.
It appears that some businesses have closed, so the NA values will be removed from the dataset.
# Install the required packages if you don't have them already:
#install.packages(c("tidygeocoder", "leaflet"))
library(tidygeocoder)
library(leaflet)
library(dplyr)
# # 1. Create a dataframe with some sample New York City addresses
# nyc_locations <- data.frame(
# name = c("Empire State Building", "Barclays Center", "Central Park Zoo", "Apollo Theater"),
# address = c(
# "20 W 34th St., New York, NY 10001",
# "620 Atlantic Ave, Brooklyn, NY 11217",
# "E 64th St, New York, NY 10021",
# "253 W 125th St, New York, NY 10027"
# )
# )
#
# # 2. Geocode the addresses to get Latitude and Longitude
# # We use 'osm' (OpenStreetMap) which is free and doesn't require an API key.
# cat("Geocoding NYC addresses (this may take a few seconds)...\n")
# nyc_coordinates <- nyc_locations %>%
# geocode(address, method = 'osm', lat = latitude, long = longitude)
#
# # Print the resulting coordinates to the console
# print(nyc_coordinates)
# 3. Plot the geocoded coordinates on an interactive NYC map using leaflet
nyc_map <- leaflet(na.omit(df_new_res)) %>%
# Use CartoDB's Positron theme for a clean, light-colored base map
addProviderTiles(providers$CartoDB.Positron) %>%
# Center the map roughly over New York City (Zoom level 11 is good for cities)
setView(lng = -73.98, lat = 40.73, zoom = 11) %>%
# Add pins to the map for each location
addMarkers(
lng = ~business_lng,
lat = ~business_lat,
# Create a popup label that appears when you click the pin
popup = ~paste0("<b>", business_name, "</b><br/>", business_phone)
)
# Display the map in the RStudio Viewer
nyc_mapBased on the open data, we only have 320 elevator businesses that have submitted job applications to the DOB.
#remotes::install_github("mfherman/nycgeo")
library(nycgeo)To work with the spatial data included in this package, you should also load the {sf} package with library(sf).
df_ele_business<-df_ele_business %>% rename(business_name = Applicant.Business.Name)
df_g_rating_zip <- df_new_res %>%
right_join(
df_ele_business %>%
select(business_name,Applicant.Zip),by = "business_name"
)
head(df_g_rating_zip) business_name business_phone business_lat business_lng
1 ROTAVELE ELEVATOR INC (718) 386-3000 40.70631 -73.91397
2 SCHINDLER ELEVATOR CORP (212) 708-1000 40.75608 -73.98367
3 NEW YORK CITY HOUSING AUT (718) 707-7771 40.74185 -73.94644
4 ALLIANCE ELEV COMPANY LLC <NA> NA NA
5 MILLENNIUM ELEV ENT LLC (718) 615-0406 40.58903 -73.94286
6 START ELEVATOR LLC (718) 324-9166 40.89859 -73.85939
business_rating Applicant.Zip
1 3.9 11385
2 3.4 10036
3 3.8 11510
4 NA 11106
5 3.7 11235
6 3.6 10466
recommendation_list %>%
slice_max(final_score, n = 5)%>%
slice_head(n = 10) business_name business_rating business_yelp_rating adj_rating
1 ALLIANCE ELEV COMPANY LLC NA 5 5
2 TRANSEL ELEV & ELECT,INC NA 5 5
3 L.C.D. ELEVATOR REPAIR IN NA 5 5
4 CHANG PARK ARCHITECT, P.C. 5 5 5
5 PATUXENT ENGINEERING, LLC 5 5 5
6 EAGLE ELEVATOR CO INC 5 5 5
7 CONCORD ELEVATOR IND, INC NA 5 5
8 P&W ELEVATORS, INC 5 5 5
9 KN3 ENGINEERING LLC NA 5 5
10 PAUL PERDEK, PROF. ENG. PLLC NA 5 5
adj_yelp final_score
1 5 5
2 5 5
3 5 5
4 5 5
5 5 5
6 5 5
7 5 5
8 5 5
9 5 5
10 5 5
df_5_elev <- recommendation_list %>%
slice_max(final_score, n = 5) %>%
left_join(df_new_res,by="business_name")%>%
drop_na()
top_nyc_map <- leaflet(na.omit(df_5_elev)) %>%
# Use CartoDB's Positron theme for a clean, light-colored base map
addProviderTiles(providers$CartoDB.Positron) %>%
# Center the map roughly over New York City (Zoom level 11 is good for cities)
setView(lng = -73.98, lat = 40.73, zoom = 11) %>%
# Add pins to the map for each location
addMarkers(
lng = ~business_lng,
lat = ~business_lat,
# Create a popup label that appears when you click the pin
popup = ~paste0("<b>", business_name, "</b><br/>", business_phone)
)
# Display the map in the RStudio Viewer
top_nyc_mapWe can now observe that there are only a small number of 5-star elevator companies across NYC, with most of them located in Lower Manhattan.
Conclusion
Due to time and data limitations, the recommendation list may not accurately capture the spatial relationship between service companies and buildings. Furthermore, using only two rating sources may not be sufficient to fully ensure that the listed businesses meet the intended requirements.
Project Reflection
However, the project also included several days of structured review and practice sessions. A key takeaway from this work is that LLMs have inherent limitations; for instance, they may reference non-existent libraries or produce inaccurate statements, which necessitated additional time for verification and correction of the implemented code. Throughout this final project, I gained substantial learning experience through the iterative process of refining API calls, consulting LLM-generated outputs, and independently validating their accuracy. This back-and-forth workflow underscored the importance of critically evaluating LLM responses rather than accepting them at face value, particularly in the context of software development and data processing tasks.